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ABSTRACT 

Two computer informatiion systems, one tor reference 
retrieval and the other for the retrieval of numerical data cn atomic 
and molecular physics are being developed at the Queen's University 
of Belfast, The reference retrieval systeip is based on the atcmic and 
molecular physics section of the INSPEC tapes. The tapes contain 
complete abstracts of the documents which are incorporated into the 
system, and from these a thesaurus of relevant keywords is 
constructed. The data base is searched using these keywords. There 
are over 7C00 documents indexed and the thesaurus contains about 3000 
terms. The nunerical data system allows a user to retrieve and 
manipulate with numerical data. At present, the data base ccnsists of 
atom-atom potentials which arc extracted from the literature, Eor 
each sta^-e is sto^e^ the available potential data along with a 
reference to the sc jrce, an <=*stimate of the accuracy and the range cf 
validity. The potentials can take various forms,* for example, a table 
of values or the paiamet^rs of a formula cf known form. At present 
six forms of potential curve fits can be accommodated. The fits are 
stored in atoiric units but can be outputted in the units chosen by 
the user. The system allows the user to manipulate with the stored 
data, (Author /SJ) 
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I . INTRODUCTION 



I . I Background 

In our studies of on-line information retrieval systems 
over the last five years, we have constantly had in mind the 
situation which we expect will prevail at the end of this 
decade when most scientists, engineers, doctors, lawyers, etc. 
have beside their desks simple and flexible computer terminals 
linked to a local computer and through this local computer 
to a data and computer network beyond. They will use the 
computer terminal everyday for either calculations or 
information, or both. It may be that most of the time the 
information they store and retrieve will be in a small 
personalised data bank on their local computer, but they 
will also be able to interrogate other data banks using a 
special data network or using the telephone network. We 
take the view that the cost of computing, data storage and 
transmission combined can be lower than the cost of 
maintaining the data, provided that the software controlling 
the information systems is efficient and well designed. 
Because the computing and transmission costs will be low 
and because the biggest cost will be the maintaining of the 
quality of the data we expect that there will be several 
data oanks in different parts of Europe and the world 
specialising in particular areas* For this reason we 
chose in Belfast to carry out our research with data in the 
field of Atomic and Molecular Physics, because Queen's University, 
Belfast is a leading world centre of research in this 



discipline* Thus we can draw on the expertise available 
in Belfast and when necessary employ the experienced staff 
we need for maintaining the data. 

We expect that our numerical data s/stem (which is 
discussed in Part B of this report) will be the only one 
of its type in the world. We aim to make it so cost- 
effective that in spite of the telephone or transmission 
costs, the system will be worth interrogating from 
California or Tokyo, It will be so specific and so 
d fficult to keep up to date that It will probably not be 
worth keeping copies of it in other locations (e,g, U,S.A,), 
Cn the other hand, we expect that our reference system will 
be accessed frequently enough to make it expedient to have 
copies available at 3 or 4 places in the world (Colorado? 
Tashkent? Tokyo?), These other copies will be updated 
automatically every week from Belfast, The European data 
system will be interrogated irregularly by the workers in 
atomic and molecular physics throughout Europe and by others 
in different fields, particularly chemists, engineers and 
nuclear physicists requiring occasional information on 
atomic and molecular physics. Many users will only use the 
system once a year or less; few users will use It more 
than 5 or 6 times In a year. The searches will normally 
be retroactive for a specified number of years. The data 
file will contain almost the whole ilterature In the narrow 
field of specialisation. Complete data files on other 
subjects will be kept at various places and will be within 
the reach of any scientist who can afford to pay the $2 or 
$3 fee, (We would hope that to scientists the data would 
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be free and the system financed by an international body. 
At a cost of about $500,000 per annum all workers in atomic 
ard molecular physics throughout the world could receive a 
free serv i ce, ) 

Because the system would be used infrequently by each 
scientist and because he cannot be expected to learn and 
remember a complex system, using It only 3 or 4 times a 
year, the retrieval system will have to be very simple. 
This excludes the use of complex Boolean statements and 
necessitates some form of question/answer dialogue based on 
simple multiple choice questions in natural language, the 
user having to initiate as few responses as possible. It 
also implies a system which is se I f - I n st rue t i ve and which 
can be used without previous knowledge or experience by 
any intelligent user. 

It is apparent that a fairly large file of references 
wou I d be needed and a rap I d respon se requ ired, not just 
because some long distance telephone calls would be involved 
but also because a large part of The cost of the retrieval 
would be proportionate to the length of the average response 
(other factors being equal). We found some five years ago 
that it was relatively easy to write an on-line system for 
a small number of documents - we were able to demonstrate 
one for 100 documents after only 6 months Inexperienced 
programming - but It is a great deal more difficult to 
write a system which gives a realistic response and cost 
for a large number of references. It is also relatively 
easy to write a system using a controlled vocabulary and 
more difficult to use a free language thesaurus because of 



Its greater size, the different forms of words, problems 
with prefixes, synonyms, etc. However, we feel that, 
looking to the future, the free vocabulary more likely 
meets the needs of the user and it allows automatic indexing 
using significant words in an abstract or in part (or later 
all) of the text. The cost of professional indexers using 
a controlled thesaurus to index a paper is high and the 
advantages doubtful because significant terms can be 
extracted adequately and much more cheaply by the comouter. 

All of the factors outlined in the above paragraphs 
have influenced us to build and rebuild a system with a free 
vocabulary, references indexed by the significant words in 
the title and abstract, and the emphasis, up to the present, 
pu7 on producing extremely efficient software tc give the 
fastest response and lowest cost possible to the user. We 
have no doubt that once we have mastered the problem of 
producing efficient, flexible and modular software which 
is proven to deal at low cost with a large number cf 
ref '=5rences , that it will be possible quickly and easily to 
adapt this software to produce many forms of retrieval 
systems and different query languages. 

I .2 Position at 1/1/72 

By the end of 1971 we had successfully built and 
demonstrated a pilot on-line system for the retrieval of 
journal references, taken from current INSPEC tapes. We 
had used this pilot system to study and develop efficient 
com^iUting techniques for interactive retrieval work. The 
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pilot project had matured to the point where we could 
automatically retrieve information from a small file of 
approximately 1000 augmented catalogue records which included 
titles and abstracts. In parallel, a secondary database 
consisting of titles and chapter headings as well as 
bibliographic details of ! been implemented for a 

small departmental computer science library, allowing the 
user to inter ro gate and retrieve on-line the stored 
information. Detailed descriptions of the above work can 
be found in previous annual reports. 

As a result of -^wo years experience with the project 
there were a number of enhancements and improvements which 
suggested themsolves, some of which we have implemented in 
the past year. These are described later in this repo'-t, 

I .3 Position at 1/1/73 

The work of adding new records to the data bases of 
atomic and molecular physics abstracts and books in the 
applied mathematics, physics and computer science libraries 
has continued throughout the year. Our main effort has 
been devoted to building up the abstracts file which over 
the year has increased from 1000 to over 6000 references. 
The references are extracted from the Physics Abstracts 
section on magnetic tapes p re pared by Inspec, The file 
covers all papers published in the field over the previous 
two years and is now large enough to be useful , tn 
increasing the data base sixfold^ we were presented with 
many problems not expected and not encountered in the sma 1 I 
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test date file. We describe these at a later stage in the 
report . 

A new library, called the School of Applied Mathematics 
and Physics Library, which merges the books from the Applied 
Mathematics, Physics and Computer Science departmental 
libraries, has been set up as a branch of the main library 
at Queen's University. Almost all of the 1,000 books in 
this collection have been set up as a secondary data base 
which allows the retrieval of information from the titles 
and chapter headings of books. As new books are acquired 
by this library, their bibliographic details are coded in 
the appropriate format and they are added to the file. 

The updating process of the Inspec records has been 
substantially changed from last year. Instead of updating 
e'^tirely in an on-line mode an off-line version is first 
applied on all known words to the system leaving only a 
small number of words (new words) by comparison to be 
handled on-line. This has speeded up our work considerably 
and is discussed later. 

The on-line control system, MCS (Multiplexor Control 
System), an operating system developed at Queen's University, 
Belfast, has been modified to allow more than one program 
+o access the same data file simultaneously and also to 
allow more than one user to access the same program file at 
the same time. Ecch user is given a separate copy of the 
same retrieval p rog ram . 

After a very successful demonstration of the retrieval 
system to the School of Applied Mathematics and Physics 
library committee, showing on-line retrieval both from the 
physics abstracts file and the merged library books, a 



Visual Display Unit has been instaiiea in the library to 
facilitate readers who rr.ay wish to use the system. a 
result, we think wc are now in a much better position to 
improve the user interface side of the system. 

I • 4 De I ay s 

During the year our work has not been held up 
significantly, as might bp expected by people living outside 
Belfast, by the constant sound of bombs and bullets, some 
very close indeed! Only a few days work were lost because 
of the "troubles". However, our work was held back 
considerably by the poor performance of the hardware, 
particularly the large Fixed Disc at the University's Computer 
Centre on which our work depended almost totally. Time 
after time files which had been built up and edited 
painstakingly were lost and the work had to be repeated; 
time after time software errors were traced after days of 
wasted effort to file corruptions which in theory should 
not have been possible if the hardware checking mechanisms had 
been working properly. Over a II, we estimate that because 
of these difficulties the project was over one month behind 
schedule at the end of the year. These problems will 
continue in 1973; but fortunately the offending device, a 
Bryant Fixed Disc, is being replaced at the University 
Computing Centre by an ICL exchangeable disc, EDS 60, This 
should be more reliable* 



1.5 Conferences and Demonstrations 



During the summer of 1972, members of the information 
systems group attended and participated in a variety of 
conferences and meetings. Papers on OSTI supported 
research projects in Belfast were given at the NATO Advanced 
Study Institute in Denmark and the Universities' Computer 
Science Colloquium in Scotland* Copies of these papers and 
reports on the conferences are given in the appendices. In 
addition, on-line mechanised information storage and retrieval 
demonstrations, linked to the ICL 1907 at Queen's, were 
successfully displayed in Denmark and Scotland. The Codata 
Conference in France was attended by two members of the group 
and the On-line '72 conference in England was attended by 
one member of the group. Reports of these are also given in 
the appendices. 

In particular, the demonstration in Scotland created a 
lot more interest than anticipated since, unlike the NATO 
one in Denmark, the colloquium held was of a general software 
nature, with information retrieval systems forming only a 
small part of the agenda. However, the talk stimulated 
such interest in the audience that a request was made to 
see a live demonstration of the system. This was arranged 
on the evening of the same day and for two hours three of 
the QUIS group present were busy answering queries of 
Interest to an audience of about thirty people who were 
given the opportunity of trying out the system for themselves. 
The result was not only pleasing and satisfactory for the 
participants but was encouraging and reassuring for the 
QUIS research members at Queen's. 
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Other University departments to link into tne system 
included the physics department at Stirling UnlversiT;., 
the mathematics department at Southampton University and 
the main library at Birmingham University. In the 
university here in Belfast we have given several 
demonstrations - among these was one given to the 
Engineering Department. A member of the group gave a talk 
and live demonstration at the New Polytechnic in Belfast, both 
of which were recorded* 

The impact that the information retrieval team has 
created at Queen's has been such that a full term's course 
on Information Retrieval Is now included in the syllabus 
for postgraduate studies for further degrees. The course 
is being given by the director of the group and will mainly 
deal with the work encountered in implementing our systems. 
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2. PROPOSALS ACCEPTED BY OSTI 

Prior to +he ending of the previous grant, proposals 
were subn^itted by us for e continuation of the work Involved 
for a further two years from the beginning of 1972. A 
summary is given In this section of the parts of this 
proposal which OSTI agreed to support. 

2. 1 Continuation of Pilot Project 

By the end of our previous OSTI grant period In December 
1971, we had already spent some months In building up a small 
file of titles and abstracts on Atomic and M:>lecular Physics 
from the Inspec tapes, the size of the pilot file was then 
approximately 1000 records. The system had been designed 
to index and retrieve automatically from a data base of 
1,000 to 10,000 references. By the end of the proposed two 
year period of the present grant It Is expected thai the 
data base will grow to around 10,000 references. In the 
process of this expansion a number of modifications and 
Improvements are to be added to the system. These Include 
the f o I I ow I ng : 
(a ) Off-line I ndexl ng 

An alternative version of the Indexing program which 
will work mainly off-line Is to be set up. This will work 
In a similar manner to the Indexing program used to create 
the already existing small file. It will Index each word 
In an abstract automatically If the word has arisen previously 
but new words. Instead of being displayed on the tele- 
typewriter, will be stored on a temporary file or backing 
store. They will later be printed out on the llne-prlnter 



with either a reference to, or the full +ext of, the abstract 
fror which they come. The indexer wll! tr.en process t^ese 
words and using a housekeeping program ^c retrieve ther 
from their tempcrarv store he car add thenn to the thesaurus 
and update their associated entry lists, etc. on-lfre. This 
version of the indexing program will have ihe advantage 
that it will use rrainly cheaper off-line overnight runs 
and the indexing tirr.e will not be wasted sitting at the 
te i ety pewr i ter while large portions of records are indexed 
automatically. The program can also be adapted to give 
a printout, when desired, of each abstract and a list 
of all the terirs arising from that abstract. The facility 
will be useful for keeping periodic checks on the indexing 
and stemming of words. The original indexing program was 
written for on-line use, not because we thought it was 
better to do it that way, but because it gave us our 
first experience of on-line programming. We believe that 
off-line indexing, as we now propose, will be cheaper jnd 
result in bet+er indexing, 

( b ) Pr i or Pi ct i onary 

In many subject fields people have already put 
considerable effort into building up thesauri. We feel 
that we can utilise some of this effort by providing a 
facility for incorporating a particular dictionary, or part 
of a dictionary, into any of our files before or aft^ijr 
we start indexing any documents. To date the whole thesaurus 
Is built from terms as they occur In the references which 
are Indexed; in future it will be from both. 
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( c ) Synonyrrs 

So far our nne + hods of dealing with syncnyn; have 
beer, sirple and designed prirr:arily to be only a means of 
dealing with aifferent forms of the one word - e.g., 
electron, electrons, electronic. In the new system the 
user can, having presented a word to the syster^, be 
presented with a list o+ related words, all of which he 
can decide to jze in his search, depending on the context 
of his query. The difficulty Is the nurr.Ler of questions 
we must put to the user to get this information from him, 
which delays the total response of the system and puts up the 
cost. We also want to allow the user to Include any 
synonyms he needs that have not been presented to him by the 
system. The system will store these synonyms and at a 
later stage an indexer can examine them and decide whether 
or not to link them within the thesaurus. This 'related 
word' package will necessitate substantial changes to our 
thesaurus structure and hence to our indexing and retrieval 
programs. However, we feel the resuirant enhancement of our 
system will be considerable If it does not turn out to be 
too costly and 1 1 me - con sum i nq to the user. 
( d ) Housekeep 1 ng 

We need the development of a set of basic housekeeping 
programs so that files can be .promptly and efficiently edited on- 
line and off-line. This will enable us to check for and 
amend operator errors, to beck up the security measures and 
to allow Inspection and modification of files as required. 
We also need programs to provide an alphabetical printout of 
the thesaurus Indicating which words had been tagged as synonyms 
and which are related words. 
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(e ) Sec ur ? ry 

The present security pre-aut! - 
extended and enlarged to '.-l ids regul""- 
of portions of files arc jr-^ jr^r- . ^ . - systerr 

in Its latest posslhi.-. •. - .-.j t ' ^ - rp^kdov.'- tie will 
also have to keec a r«=. * • - ,- 

when our olannpj of^-tf-ic .... p-- t- ^ ir,^ „ 

• " " '^ing becomes cperarive, 

to allow '-i'' sy'^+c'T ^) f f . ' !jcw n s . 
( f ) A n a I y ? 1 < o f J", v * p. rr. 

A St.-' (r- I'.i! sniiv^Ib cf o'. r :..-s-ffcm^ wl|| be 
continued with Dr;9rar^ developed to analyse our pilot 
projects. This analysis has already given us information 
on methods which will Improve the efficiency of our software 
(see our paper: "Disc Access Algorithms"). In particular, 
it helped to compress the data within the area where we 
store records. This study of the basic design of the 
system will continue throughout, including statistical studies 
based on new information emanating frotr the system as It 
grows. We expect that the results of these technical 
studies will be continuously fed into the system. Improving 
it and making it more efficient. One of these studies 
will be a survey of text compression techniques and an attempt 
to Improve on previous methods using a statistical survey 
of the frequencies of pairs, triads, etc. of characters In the 
text . 
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2 . 2 Query Fo rmu I at I on 

As our data base grows larger and new facilities are added 

the response of the system to the user will change and the 

query language or command language, as It Is variously known. 



whfch was SLitable for 1000 docurPents will not necessarily 
be suitable for a file of 10,000 documents. Appraisal 
of the user reaction to the system will be used to decide 
where it \, necessary to modify this system. We are 
fortunate to have a large group of potential users of our 
system at Queen's University many of whom are familiar 
with the use of computers and who work In the area of Atomic 
and Molecular Physics, including members of our information 
group working on the Databank supported by OSTI, tven 
though our initial system may have many teething prob ems 
this group of users are likely to give constructive 
criticisms and so we shall be able to build up a worthwhile 
assessrrent of the system. 

2 . 3 Reasons for OSTI support 

The above proposals (plus others we will mention 
In assessing our present position at the end) were acceptable 
to OSTI. Support was thus given for the continuation of 
the project with emphasis on the query formulation side 
in order to allow the information side of the work (as 
opposed to the computing side) to be strengthened and 
related to user needs before the system is developed further. 
It was also felt necessary to evaluate the results of the 
pilot study in detail in order to demonstrate the applicability 
of the work outside Belfast and to show that duplication 
does not occur with other systems. 
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3. P^GLl.EMS WITH EXPANSIC N 

3. I Off- i i n< I ndexi nc 

In tSe early stages of the development of our project we found that 
tf.e i*-'Sr efficient ncthocJ of indexing records was in an on-line rrode. 
This wcs due to the fact that our thesaurus was small and so each 
record coitainea many new words. These had to be i ndexea and added 
to the t^JS3uru5. Howev/er during the past year our data base has 
increased bix-fold, the r\n-><. ,^us has also grown so that new records 
now contain very few words 'a hTi are strange to the system. We have 
therefore changed cur indexing program to work in an off-line mode. 
Eacn word ir an abs Tract is indexed automatically if the word has 
3risen previously, but words not previously in the thesaurus, instead 
of Ireinq displayed on a teletypewriter are stored on a temporary 
file on backing sfore. 

//hen a sufficient number have accumulated another housekeepinc program 
retrieves them from their temporary store and is used on-line to 
record indexing decisions about those new words. The result is that 
the number of abstracts which can be added to the file each week has 
consiaerably increasea. I he indexer can now make decisions about the 
significance and stemmino of words in his own time and then input them 
rapidly at the teletype console. This new version of the indexing program 
has the' furt'^er advantage that it uses mainly cheaper overnight runs. 
Indeed the operators can run the program as a background Job any time 
there is !>K of spare core in the machine as the only peripheral it 
uses after loading is the disc store. 
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3.2 A Prior Dictionary 

At present our dictionaries of words in the dat.i Mies 
are built up as the records are updated; to delete all the 
rec rds currently in the file and replace them with others 
would require the generation of a new dictionary, the old. one 
being obsolete. In parallel with the maintenance of the 
present system, we are restructuring the dictionary to allow 
us to insert words before they are encountered in the 
updating phase. Thus we can take a set of words and use it 
for many sets of records for which they are relevant. For 
example, we could store our present set of Inspec records on 
magnetic tape, and build a file from the tapes to be issued in 
1973 with practically no indexing labour. 



3.3 Synonyms 

The words in the present dictionary have one Tag, the 
address in their entries; a zero address indicates that the 
word is non-significant and synonymous words have the same 
entries address. In the new version we are giving 6 tags 
(the number is a parameter) to each worJ. One tag is used 
to link synonyms in a circular chain and another Is used to 
classify a word as significant but with no entries. The 
application of the dictionary to a new set of records would 
only require the automatic resetting of these tags. 

The increased number of tags will allow us to include 
a hierarchical structure if this is considered desirable. 
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3 • 4 Housekeep I nq 

3.4,1 Alphabetical printout of thesaurus 

During the year the program ALPH was written to give 
an • Iphabetical I ist of all the words in the thesaurus or in 
any required section of the thesaurus. The location of 
each word given by bucket and word number is also printed 
out. For each significant word in the thesaurus ALPH records 
the number of times tha.t it has occurred and the position of 
its entry list. Using this program we can detect any errors 
which may have occurred in the entry lists. An alphabetic 
llstinp of the words also shows which words have been linked 
as synonyms and any incorrectly spelt non-significant words 
which may be deleted from the thesaurus, ALPh can also be 
used to obtain a list of the words In the order in which 
they are stored, i,e. In ascending bucket and word number. 
Using the program an alphabetic printout of the dictionary 
as existed in our Inspec disc file In June 1972 was 
recorded in a Special Report, SR6, 

3.4.2 Pre-editing Entry Lists 

After studying the QUOBIRD system it became apparent 
that the speed of the retrieval stage depended to a large 
extent on the amount of list processing carried out. 
Therefore If any operation could be carried out at the 
indexing stage, when time is not as Important and the action 
is only required onC9, and not at the retrieval stage, when 
speed of response Is Important, the system user would benefit. 
One such case was found after the long-list package (see 1971 
Annual Report) was written:- after an entry list has been 
picked up In the retrieval stage, first It Is sorted and 
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repeats are thrown out, that is, cases where a word had 

appeared twice in the same 5enter.v.:e in a record. When 

this operation takes ploce in core the tine involved is not 

very great, but when the entry list is too large to fit 

into core quite a number of disc accesses will be needed. 

in order to avoid this problem a simple hou se- keep i n g program 

was written which checked through existing files, sorted all 

entry lists end threw outj repeats. It then only required 

a minor modification to the indexing program to check each 

/ 

/ 

time a new reference was added to an entry list if the last 
entry in the list referred to the same record and sentence 
and, if so, not to update the entry list. In other words 
repeat headings are never placed in the entry list and so 
there is no need to eliminate them. It is recognised that 
the above procedure loses information about the frequency 
of occurrence of a word, which Is important in a weighting 
system. This difficulty will be overcome in a new version 
of the system almost complete. 
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3.5 File Security Problenr.s 

In this section we describe some of the file security problems that 
we currently experience and th. efforts we use to overcome them. 
We find that files get corrupted, either wholly or partially, due to 
hardware failures, operating system failures, and human shortcomings. 
It requires a lot of effort to raintain a pure data base and we find 
that much of our time is spent not in adding anything new to the 
systerr, but in preserving intact the files already created. We do 
not fee! that for the purposes of this report it is necessary to 
dwell on the variety of hardware and operating system's software 
faults that occur and the reasons for their occurrences except to 
say that the large Fixed Disc Store on which most of our information 
Is stored has been the prime offender. 

We feel it is more important here for us to illustrate the kind of 
file corruption problems we do encounter, whatever their reasons, 
and to say what we do to remedy them. 

To deal with complete file corr »ption back-up copies of three tapes 
per file are kept in a separate building fron the computer. The 
frequency of these tape copies depends on the frequency of file 
updates; a file which is changed every day is copied to tape twice 
a week, whereas a file which is changed only once a week is copied 
every fortnight. A more difficult problem arises when a portion of 
a file, perhaps a single bucket or even a character is corrupted. 
It can happen thar this kind of corruption can remain undetected in 
the normal cou.^se of evenhs for a considerable time, even outlive the 
grandfather, father, son security precautions we take. 
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To demonstrate more fully this kind of trouble, we first recall 
the filing structure and organisation of the sys-^-err^ It is a thr^e 
level filing structure which is controlled by a dir^^ctory at the 
start. One file is usee for the augmented catalogue reference records, 
another for the list of dictionary or thesaurus terms, which may or 
may not be subject index terms (dcpencing on their judoed relevance), 
and finally one for the set of entries or postings, which stores 
in 3 more random ^ashion, each occurrence or every index term as 
a catalogue record is being processed. 

A particular hazard is when one or two of the entries in the ran^ -n 
postings file get Destroyed and wc have no way of tellinq this has 
happened.. Somc1irr.es this type of trouble can remain dorridot for 
a while and then suddenly spread and create havoc. Whilst it is not 
possible to completely solve the problem we do have a program rhat 
has proved successful in detecting the trouble. It works as follows: 
it takes each index term in the dictionary and checks throuqh its 
associated entries to test for inconsistencies and loopinq; it prints 
an error r. -^ssage if these occur. This program is run at freauent 
intervals and each ttme there is a suspicion something has happened 
to 3 file. As a further measure of keeping the entries file free 
of trouble, each time an entry is being included, checks for 
inconsistencies are made. For example, the last word in every bucket 
is kept empty and should the program find something in this word an 
error message is printed out to the effect and the program halted. 
Again each entry to this data file requires two enpty computer words. 
If the indexing program finds some place on the random file where 
"ihis condition does not hold then again appropriate action is taken. 
Another security measure is one capable of d23ling with sudden machinp 
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failures, which (unless care is taken) can leavp a prograrp in an 
undetermined state. This can speM trouble, particularly, for our 
indexing system; built in flags which can be set and then cleared 
at appropriate points can take care of this trouble. The first 
thing our indexing program does as soon as it is initiated is to 
Check if such a breakdown has taken place - i f so the program then 
knows to take the necessary steps. This safeguard was described 
in last years annua! report, 

A useful aid thai we use is to keep one word in every bucket to 
record the bucket number itself, Somctim-}S we fine that a program 
can read the wrong bucket and fail to show on th^^ reply word. Thus 
without real! sine anything is wrong th- prooram or.erds the v.ronc 
buc'^.;t and pr eet.ds to write it b^c^ to the "r'ghf olace, Cy havinr 
our built-in huci-et numbers this error can be controlled. Another 
useful practice we have include^.! is to keep our retrieval file separate 
from our upooMnq file. This is done by keeping a file for weekly updates 
in which we attempt to clea*-- any rubbish before it is rr<erqed with the 
retrieval file when a new weekly file for updates commences. We find 
that we can ue.ect trouble in the smaller file more easily if it goes 
straight on to the master file. Finally we have built in the dates of 
updates to files so that when indexing commences a message is first 
printed out to say when the file was last processed. This helps 
particularly when files on disc are recreated by the computer contre 

whose job it is to manage all application files. 
The Computer Centre may recreate a file from its own tape copies after 
a failure, but not realise that between the time when it took the 
tape copy and when the failure happened the file was edited. Noting 
the date of the last u^;date or edit makes sure that errors cannot 
occur from this source. 
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4. COMMAND LANGUAGE IMPROVEMENTS 

4 . I I ntroduct I on 

Whilst we are fortunate to have a large group of 
potential users of our systems at Queen's University, 
particularly in the field of atomic and molecular physics, 
we, nevertheless, have to make certain fundamental efforts 
to induce them to use the system. Our first priority has 
been to keep the data bases up to date. Also to facilitate 
users, we have sent out circulars telling them about the 
service offered and, in addition, we have enclosed simple 
and pictorial illustrations on how to use the consoles (see 
Append i X ) . Apart from this we have spent a fair share 
of our time giving demonstrations and talks on the system. 
Before enlarging on the improvements we have made to 
accommodate our user audience, we first give a summary of 
the information retrieval system using the physics abstracts 
database as it was at the end of 1972. 

4.2 Description of on-line retrieval system for Physics abstracts 

ABSTRETR is a system designed to retrieve references on Atomic 
and Molecular Physics. For 1971 and 1972 the sections taken from 
Physics Abstracts are:- 

13.00 Atomic and Molecular Physics 

13.20 Atoms 

13.23 Hydrogen and Helium Atoms 
1 3.25 Isotopes 

1 3.30 Molecules 

13.31 Inorganic Molecules 
13,37 Intermolecular Mechanics 
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From January 1975 sections 5.2 and 5.4 of Physics Abstracts 
will be added to the database. At present there are over 
6,000 abstracts stored in the system. 

'n the retrieval program words which describe the subject 
on which Information is required, i.e. search terms or 'keys* 
are matched against the titles and texts of abstracts. 'Keys' 
which may be used for retrieving include chemical elements 
and compounds, experimental processes, mathematical procedures, 
abbreviations, e.g. RKR, SCF, personal names (where they 
occur within a title or abstract) and chemical bonds between 
atoms represented thus: H-C-N. 

The linking of words as synonyms Is confined to alter- 
native spelfings, e.g. sulphur and sulfur, and the symbols 
tor chemical elements with the name of the element itself 
unless the symbol may also be a common word such as: IN; AS; 
BE. Pairs of words which are sometimes written with a 
hyphen joining them and sometimes as one complete word are 
also linked, but there is no link where the words of the 
pair are written separately. Thus "au to- i on I sa t i on" and 
"autoioni sation" are linked but not "near-threshold" and 
"near threshold". 

To recall the maxlnrium number of documents about a 
subject the searcher must use 'keys' for all the possible 
variations in describing the subject. Some information about 
members of a group may be found by searching under the group 
name as well as the individual member, e.g. to look under 
halogens as well as chlorine. 

To limit the number of documents recalled the 'keys' 
chosen should preferably not include frequently occurring 
words such as electron, atom, molecule but should describe 
precisely the element, compound and procedure required. The 
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number of documents recalled by some frequently occurring 
can be reduced: 

(a) bv adding a prefix to the key e.g. L-shell 

P I -e I ectron 

and (t^ oy using the word in a phrase which must occur 

within one sentence of an abstract to be recalled 
e.g. auger electron spectroscopy 

NOTE If the words of the phrase are used as separate 

keys the program will find the number of abstracts 
in which they occur together; the words may then be 
in different sentences. 

4 . 3 Applied Mathematics. Physics and Computer Science Library 

During the past year the books of each of the three 
departmental libraries consisting of applied mathematics, 
physics and computer science have been merged. As stated 
earlier almost all of the 1,000 books in this collection are 
now in a computerised data base allowing retrieval of 
information from their titles and chapter headings. However, 
with the merger, a problem arises because each of the three 
separate libraries uses its own c J a ss i f i cat i on scheme. The 
cataloguing department of the main library has undertaken to 
change this to the appropriate Library of Congress class 
mark to conform with the rest of the main university library. 
In anticipation of this impending change, the field allotted 
to the reference or classification number in the record for 
each book in the applied mathematics and physics books file 
has been left blank. The correct numbers have not yet 
been inserted as their allocation is not yet complete. The 
computer science books, which were processed first, had 
their own reference numbers. A program, MARK, has been 
written to put the new class mark In place of the old one. 
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4.4 Hyphenafed Words 

The first main cnange In indexing to be made during the year 
was the procedure for dealing with hyphenated words. Initially the 
Indexing program treated the words on either side of the hyphen as 
separate words. This method was not entirely satisfactory If one 
of these words Is non--signi f leant when used on Its own. Perhaps 
the most obvious example of this is the ter7D"on-l Ine" where "on" 
cannot be Indexed as significant. Other problem*^, arose from the 
Inconsistency in the use of hyphens In the text of abstracts, e.g., 
"ultraviolet" appears in this form as often as in the alternative 
"ultra-violet". This means that when retrieving records containing 
"ultra-violet" it would be necessary to use "ultraviolet" and 
"ultra-violet" as two separate keys. There were so many examples 
of these problems that It was felt necessary to alter the Indexing 
method. The new version of the program, which stores both words of 
a hyphen pair, with ano without the hyphen, removes the two difficulties 
described above; the program checks the thesaurus for the previous 
appearance of each word of the pair and presents them for judgment If 
fh\s is the first appearance. The words are considered as a pair 
and also in relation to each other overcoming the problem arising 
if one of them should be non-significant. As the words are also considered 
without the hyphen being present the problem of Inconsls'-ency is eliminated. 
This change In the program highlighted the great variety of ways In which 
hyphens are employed in abstract texts. There are numerous examples of 
three words Joined by two h^^ens, e.g. tlme-of-f I Ight; f ue!-to-oxIdant. 
Somewhat 'ess frequent In occurrence are four words Joined by hyphens 
and very occasionally five, e.g. electron-acceptor-electron-donor; 
valence-shel l-ekctron-pal r-repul slon. There are also words followed by 
a hyphen and without a second word attached, e.g., boron-, carbon-, 
nitrogen-, fluorine-like, or words with a hyphen preceding them, e.^,.. 
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-inimo or even -/-quark. Complex inorganic compounds 
usually have names complicated by hyphens as in trans-bis 
(dlphe->y!-0-selenolatophenylphosphine). Personal names 
used to describe procedures are often paired together by a 
hyphen. This leads to some odd looking "words" in the 
thesaurus where the names are linked without a hyphen but 
doesn't affect ttie retrieval process. This method of deali 
with hyphens makes it easier to include bonds between atoms 
or radicals as retrieval terms, e.g. information about the 
links H-C-N can be found more precisely than by intersecting 
H, C and N separately. 
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^•5 Suppression of superfluous printing during retrieval 

After a user has retrieved a set of documents he can ask 
for either the references, text or sentences to be printed out. 
There is up to rt una te I y no mechanical way of stopping the program 
if it is printing out a lot of irrelevant material, A line limit 
is imposed by the operating system MCS which returns the user to 
monitor level after a predetermined number of lines and he may 
then resume the printout at the position at which it was term- 
inated or ^e may reload the program and start the search again. 
This is not a very satisfactory solution as most users want to 
return to the search but omit the printing of the references. 
It was decided that the best way round this problem was to 
initiate a counter each time printout of either references, text 
or sentences was requested. Then every time five references (or 
five abstracts or sentences) have been printed the user is asked 
if he wishes to continue the printout, start a new search, exit 
from the program or return to the stage he had reached in his 
search before he asked for a printout. 

In the near future it is planned to include in the system 
a special query to the user if he asks for an unusually long 
printout. This will warn him what he is doing and advise hfm 
to have it printed out instead in an overnight run on the line 
printer (see next section), 

4,6 Of f-l ine Printout 

As the size of the data-base grew (at present 6500 records), 
so the number of documents retrieved by any key usually increased 
also. Whilst some users are prepared to use further keys to 
narrow their search down to a small number of documents, in other 
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cases the user may want to retrieve a large number of documents 
and view them all. As M is very tedious to examine the abstracts 
of a large number of documents on-line It was decided to provide 
users with the option of printing their references and abstracts 
off-line. From the software point of view this entailed pro- 
viding a special file in which users* names^ addresses and a 
list of the references to be printed were stored. Every night 
this file Is examined and the required references printed out 
and the file Is clearec ready for re-use the following day. 
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5. ADDITiO.^AL RL'SEARCH 
5. I FORTRAN or Assembly Language 

The software for the OU0BIRD system was originally written in the ICL 
1900 assembly language, PLAN. This can only be used on the ICL 1900 range 
of computers and is difficult to amend, so/a first step towards re-writing 
the wliule system in the more universally used language FORTRAN it was 
oecided to rewrite pa^t of it and to compare the efficiency of this code 
with that of the original '^LAN programs. 

Fortran is a language designed specifically for manipulation of 
numerical ciata, and as such does not have any special facilities for 
storing and handling text or character data. It is not therefore on the 
face of it particularly well suited to writing programs which involve large 
amounts of charactt*- manipulation. However we overcame this deficiency by 
storing four characters to one 24-bit word, and treating this as an integer, 
in C0NPRESS INTEGER mode. We found that character handling was greatly 
facilitated by the use of two Fortran standard subroutines, C0MP and C0PY. 
C0MP compares two character strings for equality, and C0PY copies a 
character string from one location into another. 

When we had written the main QU0BIRD software in Fortran, its 
efficiency was compared with that of the Plan version by setting up a very 
small data base (6 books on Quantum Mechanics) using each set of programs 
in .turn. We found that the Fortran programs took on average approx. 
2.6 times as much CPU time and used approx. 2.4 times as much core storage 
as the PLAN programs. We felt that this drop in efficiency was probably 
acceptable as far as the data bass^ generation programs are concerned. 
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since they are essential ly used only once for each batch of documents 
which is added to the system. It was felt that the less of efficiency 
caused by using the Fortran version of this part of the software was 
offset by the advantages gained, e.g., ease of modification of these programs 
when they are coded in Fortran. 

However as the augmented catalogue retrieval programs are being used 
constantly, their efficiency in -^arms of mill time and core area used 
obviously contributes directly to the overall economic feasibility of fhe 
system. We do not therefore think ir would be advisable to use the Fortran 
version of the Document Retrieval pro^^ram In a commercial working system, 
because of the resultant substantial drop in economic viability. 

Work is also in progress at the moment to write the QU0BIRD data base 
generation software in another high level language, PASCAL (Reference I), 
which has excellent character manipulation facilities. To this end the 
PASCAL compiler in use at Q.U.B. has been modified to provide a random 
access file facility and the new PASCAL program is written but not tested 
yet. 

5.2 Efficiency of Hash Indexing 

A study is near completion on the efficiency of the construction of 

the inverted file, with particular regard to how this efficiency varies 

with the "bucket capacity" used in the thesaurus. A bucket is a block 

of data which can be written to or read from disc in one operation of 40 

keywords. Each keyword is placed in its appropriate bucket by a hash 

addressing technique based on the division of the binary integer which 

represents the first four characters of the word, by the number of buckets 

available in the main storage area. If a bucket overflows the keywords 

which it contains are split up between it and two overflow buckets by 

hashing them again, using the number three as a divisor. 
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During the retrieval process, each time the thesaurus file is interrogated 
means an extra "disc access". And each disc access costs money. It can be 
seen therefore that the number of overflows in the thesaurus must be kept to 
a minimum ,n order to keep the running costs of the system as low as possible. 

The mean number of cverf love per record in the inverted file was found to 
depend on three factors: the bucket capacity, the packing density, i.e, the 
number of records stored as a percentage of the total capacity for records, 
and the overflow technique. There are a number of methods for dealing with 
the overflow problem, one of which has already been described. others include 
serial overflow (Reference 2) mi nimum overf low (Reference 3) quadratic overflow 
(Reference 4, 5, 6) and random overf low (Reference 7). It is not necessary 
to describe these techniques in detail here, suffice it is to say that they 
all work on the principle of directing overflow records into a bucket in the 
same storage area which is not yet full. 

For each of these overflow techniques we simulated one hundred inverted 
file systems, with bucket size running from I to 100. when each of a number 
of predetermined packing densities was reached the number of records which 
had overflowed up to that point was recorded, and the quantity a (mean number 
of disc accesses required to locate a record in the inverted file) was 
computed. For tl ese experiments a random number generator was used to 
simulate the random filling up of the thesaurus buckets with keywords by the 
hash addressing system. Each experiment was repeated one hundred times, and 
the mean was recorded as a reasonable estimate of a in each case. The 
probability of error was also determined. The results of this research will 
be published shortly. 
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5 • 5 • Mach i ne- I ndependence of Data Bases and the Programs 
which Manipulate them 

There are two aspects to the mac h i ne- i nd epe ndence , or 
portobility of any software system: the portability of the 
programs which constitute the system, and the portability of 
the data upo' which these programs operate. In the case 
of an information retrieval system, processing stored data, 
clearly both aspects must be taken into consideration. Durin 
the past few years these aspects of portability have been 
investigated in a project carried out in the Department of 
Computer Sc'ence at Queen's University, Belfast. 

Consider programs which may manipulate data bases, held 
on auxiliary storage media, in both a sequential and a random 
access manner. Portability requires that a machine- 
Independent interface be constructed between such proc,- jms 
and the basic facilities for driving auxiliary storage media 
on all computer systems on which they are to be used. An 
interface of this nature cannot be completely implemented in 
a high-level programming language: a small, well-defined set 
of mach i ne- language subprograms is required for each distinct 
compute- system. In the course of the project, interfaces 
have been constructed for the ICL 1900 series and IBM System 
360 using a judicious mixture of ANSI Fortran and the 
appropriate machine language in each case. These interfaces 
have been tested using a i oca I I /-deve I oped data processing 
program, written in ANSI Fortran, the data base being set up 
from scratch on each computer system. 

The complementary problem of data base portability is 
C'jrrently being investigated and takes as its starting point 
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me existence of a data base manipulated by ANSI Fortran 
programs, and set up using the Interface briefly described 
in the preceding paragraph. Some additional software is 
required to convert such a data base in1o a suitable form on 
magnetic tape for transfer to another computer system, and 
to perform the converse operation. Preliminary work involving 
small amounts of data is being carried out at present, and 
it is hoped subsequently to Set up and transfer a substantial 
data base meefng the requirements outlined above. 

Eventually, when the cost of data transmission over 
long distances is reduced to an acceptable level, portability 
of programs and data bases may cease to be of interest to 
the designers and implementator-, of information retrieval 
systems. This will, however, never be true of all software 
systems, and for the immediate future will also continue to 
be an important consideration for those involved with information 
retrieval systems. 



5 . 4 Data Compression 

Due to the large number of abstracts bei .g constantly 
added to the data base, the need for. more random storage 
access is rapidly expanding. It would therefore be 
advantageous to have some means of data compression to reduce 
storage costs. This problem of compression has been studied 
by two research students. 

The basic principle involves the substitution in the 
data of single characters for regularly occurring groups of 
letters or symbols, e.g. "lONISATION OF THE ATOMS" C3n be 
reduced from 23 symbols to 10 as (I ON ) ( I S ) ( AT ) ( I ON ) (OF ) ( THE I 
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(AT) (0) (M) (S). These groups of characters, composite 
characters, vary in length from 2 to 16 characters in our 
system. A subset of the data held on disc, about 200,000 
characters, was scanned and 15 lists of around 200 members 
each were compiled containing the most commonly occurring 
composite characters of each length. The data was then 
compressed using various combinations of these composite 
characters and an estimate of the compression obtained. 
The storage saving was usually between 2751 and 55%. This 
was disappointingly small and we doubt if the computing 
time needed to pack and unpack the data from its compressed 
form is worth the effort. 
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6, ASSESSMENT OF PRESENT POSITION 

In assessing the present position of the reference 
retrieval system we look separately at the two main supported 
sections of our work, viz, continuation of the prebent 
project and query formulation and below we list some of the 
enhancements we hope to include in the coming year, 

(a) Alternative Retrieval Mode , As mentioned previously, 
we desigr.ed our retrieval system to be self-explanatory as 
far as possible. While we have found this approach to be 
very effective with the casual user, we have found that the 
more sophisticated and frequent users of the system would 
prefer greater ^exibility than is available at present in 
building up their search queries, Tney have not, (n fact, 
been hampered in this by the search logic within the retrieval 
orogram, but by the natural language in which the search 
que-y is formulated. We feel therefore that this type of 
user should be provided with a different path through the 
search program to enable him to build up more complicated 
Boolean expressions easily. For example, we will allow the 
use of logical and and o£ in statements, and we will allow 

a command "STORE N" in which the list of entries at any 
point in the search can be sto red and retrieved later by the 
command "RETRIEVE N", In this, "N" could tJke any number 
between I and 8, 

(b) Ana lysis of Systems , A statistical analysis of our 
f»ystems will be continued with programs developed to analyse 
our pilot projects. This analysis has already given us 
information on methods which would improve the efficiency of 
our software. In particular, it helped us compress the 
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data within the a'-ea where we store records. This study 
of the basic design of the systems will continue throughout, 
including statistical studies based on new information 
emanating from the system as it grows. We expect that the 
results of these technical studies will be continuously fed 
Into the system, improving it and making it more efficient. 
One of these studies has already been done in a survey of 
text compression techniques and an attempt to improve on 
previous methods using a statistical survey of the 
frequencies of pairs, triads, etc. of characters in the 
text (see section 5.4 ). 

(c) Indexing and Stemming . We intend to investigate 
thoroughly many stemming problems which we have encountered 
when indexing our records and the feasibility of solutions 
which have been suggested. These Include such items as the 
indexer needing some form of authority on which to base 
decisions, for a minimum stem-length being set according to 
the number of characters in a word, e.g. a word of N 
characters would not have a stem of less than N-4 characters. 
We intend, for each subject we are indexing, to provide a 
list of 'danger* words. These would be words that would 
always need to be presented in the context of each 
particular abstract. Such a word which has arisen In our 
work is AL, which arises as the chemical symbol for 
aluminium or in the expression 'et al'* 

(d) User Manua I . Whilst we have designed our system to 

be as se I f-exD lanatory as possible we feel that a sophisticated 
system will contain facilities which are not apparent to 
the casual user and for this reason we feel that the 
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production of some form of manual is an urgent necessity. 
In addition, this manual would also help the user to 
understand problems which arise from his interaction with 
the operating system, e,g. The user could consult it if he 
has trouble logging in or, for example, if he does not 
know how to reactivate the teletypewriter if he is timed 
out, etc. 

(e) Language , Whilst we designed the command language of 
our retrieval p rog ram with an outside user in mind, this 
was not true of either our indexing or housekeeping programs 
and the command language of these programs needs to be 
modified to help outside users of this software. [i is 
not self-explanatory: there is no facility by which the 
indexer can ask for more information and he cannot, for 
example, see the sentence from which a particular term arose 
in the abstract or title. The housekeeping programsused 
for correcting data already on the disc were written for 
our systems programmers and could not easily be used by an 
indexer (unless he could write octal instructions!). 

Regarding the query formulation side we envisage three 
main ways in which we will assess user reaction to the 
system: - 

(1) By compiling a questionnaire which our initial "test" 
users will be asked to complete after each period of on-line 
searching. We will also approach users personally to 
discuss their reactions. 

(2) To study the con ver sa t i on s between users and the system 
we will (with the user^s permission) duplicate copies of 
conversations on a special teletypewriter or onto disc. 

Th#s will require some changes in the operating system to 
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Summary of the problem 

An ever increasing problem which is facing the scientist 
of today is the rate at which new tecnnical data is accumulat- 
ing. This growth naKes it gradually more difficult for the 
researcher firstly to locate any information he requires 
and secondly to find it in the form he needs. In particular 
it increases the risk of duplication of research with all 
the wasted hours that this entails. Four years ago this 
group began a study of tnis question by ouilding a databank 
on atomic and molecular physics. It was our ODjective to 
use this as our example to study the prouleni of on-line 
data retrieval as a whole. 

decause of the size of the field of atomic and 
rolecular physics, it was necessary to restrict our attention 
to one particular aspect of it, namely interatomic 
potentials. The in.portance of these is their close 
relationship to many of the physical and chemical properties 
of matter. Our first aim was to build a system which could 
store these potentials and to enable then, to be retrieved 
"on- I i ne" . 

As the system was being built, it soon became clear 
that it could be developed one stage further from being 
merely one for retrieval of information to being one which 
allowed manipulation of stored data. A simple example of 
this manipulation is to provide the facility for the 
potentials to be retrieved in whatever units reauired. 

The idea of manipulation can however be developed 
much further. When a scientist requires a potential his 
interest is not so much in the potential itself out rather 
in using it to calculate parameters which define one of the 
physical or chemical properties of matter. An obvious step 
then is to build into the system programs which will enab'e 
him to carry out such calculations. To do so there are two 
r ea u i remen t s . Firstly, the programs required for these 
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calculations must D( Duilt and added to tne systen.. 
Secondly, to enable calculations to be fnade i tN a [^articuiar 
potential, an automatic procedure will be needed to conslruct 
from the va r i ou a p p rox i na t i o n of the pc t un t i a I n i c r, hove 
been stored an estimated pdential which is r up r e r,e n ta t i v e of 
the moi^t reliable results avdilable. The first uart of this 
presents no ndjor difficulty but the second does pose serious 
problerDS .mien are discussed in Chapter 5, Suction 2. 
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^os i t i on at 1/1/72 



ERIC 



by tfie erid of 1971 several tfiousand papers had ueen 
examined and out of these some 750 which were considered 
relevant had beon located anJ photocopies of them obtaind. 
Many of tfiese were found on close study to contain data wnich 
was not worth storing, for a variety of reasons, e,g, no range 
given on data replaced by more accurate later data. In 
general, less than one paner in three was found to contain data 
worth s tor I ng . 

Approximately 500 assorted worthwhile potentials were con- 
tained in the data b;ink together with an estinate of the short 
range potential for the ground-state of every possible pair 
of atoms which does not includo a hydrogen atom; that is, 
a total of 5,500 potentials in all. 

An updating system was in existence which enabled new 
potentials to be added to the system and to take their correct 
position with respect to those already in the bank. 

A smooth representation of the potential over the whole 
range is necessary before manipulative facilities of any 
complexity can be offered. t^y the end of 1971 a method 
existed within the system which fitted an analytic function 
with coefficients chosen to minimise the least square error 
to a function of the logarithm of the potential. However 
the results of this, whilst fairly satisfactory, showed that 
there was still some way to go to solve this problem with 
prec i s i on , 

In the retrieval system a user could retrieve any of 
the stored potential estimates in units of his own choice, 
with or without the associated references, and with the . 
estimated accuracy and range of validity along with any 
other comments included about the potential , 



Two other options were also available - the first 
calculated and dis'played the most accurate values of the 
stored potential in a range specified by the user, while the 
second calculated a representation of the potential over the 
whole ranqe. In both cases the user could obtain a pseudo- 
plot of the vc'jiues on his remote terminal. Alternatively, 
if he wanted a graph of the potential curve, he could store 
the data for later output to a gr ap n-p I o t ter , which cannot 
be accessed directly from the on-line terminal. Whilst the 
system at this stage was fairly flexible in what it offered, 
even to the extent of including novel features like on-line 
pseudoplots, it nevertheless had a big drawD'-ck in that it 
did not permit the user \^ manipulate with whatever potential 
he chose h i mse I f . 

I . 3 Pos i t i on at I / I / / 3 

By the end of 1972 almost all of the possible relevant 
papers have teen located. The reading of these papers and 
the extraction of interatomic potentials from ^hem is now 
virtually complete. A test of the thoroughness of our search 
revealed +hat as many as 9A% of relevant paper.s have been 
traced so far, this oeing before the data base is even completed. 
The data extraction is dealt with in greater detail in 
section I of Chapter 3. 

The present report describes the retrieval side of the 
system both from a user's viewpoint and from a system's 
viewpoint. We recommend readers who are only interested in 
the facilities which the system offers to read the first 
section as well as the conversation display illustrating 
what the system does. However, those interested in a detailed 
description of the software involved should also read the 
section on retrieval from a system's viewpoint. 

Whilsf we feel we have added a greater degree of user 
flexibility to the system over the past years and have 
Incorporated further manipulation facilities into it we 
realise that the user conversational language still requires 
some tidying up: we now believe the stage has been reached 
where this can be looked at more closely. 
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2. PROPOSALS ACCEPTED bY OSTI 

Prior to the ending of a previous grant, proposals 
were submitted by us for a continuation of the work involved 
for a further two years from the beginning of 1972. A 
summary is given in this section of the parts of this 
proposal which OSTI agreed to support. 

2.1 Data Collection and Authentication 

by the end of the proposed two year period of the grant 
the data base of interatomic potentials is to be completed 
together w^th a critical evaluation of the data, A system 
to ensure the data is kept up to date is to be set up. 
The data base will be oroadened with the inclusio;i of 
s.uubidiary data files in subject areas closely related to 
interatomic potentials. These include 

( I ) Osci I lator strengths 

( 2 ) Energy I eve I s 

(3) Transport properties 

(4) PolarizabiMties 

A start is to be made cn building a new data system based 
on wave functions. 

2.2 Critical Analysis of Data 

The evaluation of the data will be carried out in two 
stages. Initially this will be carried out by us base, 
on our own experience. However outside expertb employed on 
a consultancy basis will be used to consider tliiG problem 
as 3 whole when the data base is complete. 

2.3 Management of Data 

Programs to enable aHerations to be made to the stored 
data are to be written. A number of housekeeping programs 
will be needed to implement security procedures and alf.o to 



keep accounts of the use of the system. 

2.4 An Intelligent Data System 

Programs will be written to enable the user not just 
to retrieve the data but to manipulate it, that is, to 
compute other data with it, to change its form, to generare 
related data, etc. Programs suitable for inclusion will 
have to be tested for the accuracy of their results and 
modified to a form compatible with the system. 

2.5 Query Formulation 

It is essential to adapt the system to the needs of 
the user who may either be experienced in using it or just 
a novice. To this end, it is proposed to develop two 
systems, one for each of these users. 

Some means of obtaining the user reaction to the 
system will be developed in order to judge the success 
of it. 

2.6 General On-Line Data Systems 

An investigation as to the viability of applying the 
general techniques adopted in the ^resent project to an 
on-line data system in other fields such as the medical and 
engineering sciences was p roposed . However OSTI felt 
they could not support this at this stage but may do so at 
some I ater date . 



3. DtVtLOPMLNl OF ."loiU^ IN 1972 



3.1 ua ta Extraction 

3.1.1 Literature Search 

The potentials whicn forn the data oase are found in 
papers published in the various scientific journals. 
The five different methods of searching the literature were 
considered in detail in the 1971 Annual Activity Report 
and are I isted tiere without comment:- 

(a) "Physics Abstracts": published by 
Institute of Electrical Kngineers 

(b) "Bibliography of Atomic and Molecular 
Processes": published by the Atomic 
and Molecular Processes Information 
Centre at Oak Ridge National Laboratory 

(c) Re view Papers ana Books 

(d) Personal Communications 

(e) Literature references 

By (e) we moan references within one paper to other 
papers which may also contain relevant potential data. 
This continues to be a very important source of references, 
particularly to papers dating back to before 1965. 

iJy the methods outlined in the previous paragraph 
some 1330 papers have Deen located and photocopies of 
these made. This includes all the relevant papers up to 
the end of 1972 as reported by "Physics Aostracts". 

An important question that obviously arises is how 
thorough a searcn has been made. V^hilst one can be 
confident of an almost \ 00% success rate for publications 
in recent years (say since about 1966) one can be less 
certain .iDOut earlier papers. This is supported by our 
experience as we read through the papers which indicates 
that it is the older publications which are harder to 
trace. Gradually, however, these gaps should be filled 

in as the search proceeds. 



A more specific inalcation of tho thoroughnobs of 
the searci-i was obtained frorri the references cuoted in 
"A bibliography of ab initio Molecular ^a^e Functions" 
D y W , G . f? i c h a r d s , T . t • H , Walker and K , ^ . S i n k I o y . 
This book, published in 1971, lists the references for 
the best availaole ab initio calculations for the 
potential energies for ail atonic pairs. Of \b4 
references made to 107 different papers it was found that 
only ten hao not been located uy us. 

In an atterr.pt to find ways of improving the efficacy 
of the literature search, an examination of the ten 
missing papers was valud.>le. Tour of the papers were 
dated before j , the oate from which we began our detail 
literature searcii. As indicated earlier our hope is that 
any relevant early papers are gradually found, as indeed 
these four have been, through references to them in more 
recent publications. This shows that only 6 papers out 
of 107 were not found whicn indicates a 9A% recall at a 
time when the data bank was not complete. 

3,1.2 Ext ract i on of Dai ^ 

The actual data which is extracted from a particular 
paper has been discussed in some detail in previous 
Annual Reports and a list is given here without comment:- 

(1) Paper title and reference 

(2) Diatomic system and state name 
( 3 ) Type of poten t i a I 

(4) Method of calculation 

(5) Parameters, formula or values which 
actually define the potential 

(6) Units 

(7) I rror and source of error 

(B) Range; numerical and indication as to 



whether it is short, intermediate or 
long range 
(9) Any other relevant information. 

A number of other items of data peculiar to 
particular types of potential nas been added to this list 
Firstly, for potentials calculaiod using the Rydberg- 
Klein-Rees (RKR) method it is essential to extract Dq, 
the well depth of the potential concerned and Tg, the 
height of the potential minimum of the state being stored 
above the potential minimum of the ground state. 
Secondly, numerical results obtained from ab initio 
calculations are mostly given as total energies rather 
than potential energies. This requires that the 
separated atom energies of the atom pair concerned oe 
stored for numerical potentials. In cases where the 
author already has changed from total energies to 
potentic! energy a value of zero is stored instead of a 
separated atom energy. 

3.1.3 Present Position 

Of the 13 30 papers which have Deen located 
approximately 1250 have now been read. From these data 
has been extracted from 340 papers, which is about 25%* 
A total of 200 papers have been processed only partially 
since they present problems which have as yet not been 
resolved (see next section). The number of papers 
which have not yet been read is small and it is expected 
that these will be processed in weeks rather than months 
so that the data base will be up to date by February 1973 

The data f rom some 200 papers is at present being 
tested prior to being added to the potentials already 
in the data bank. 

3.1.4 Specific problems which have arisen 

(i) As yet we have no means of storing potentials which 
have been expressed by the author in the form of an 
an a I / 1 i ca I exp ress i on . The forms c ho sen rarely fall 
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in to a general pattern and the only solution may be to 
store each one (of about forty to fifty) separately as 
if it were a different type of potential, 
(ii) A large number of potentials are presented in a 
graphical form and as yet we have tried no method of 
reading ttiese accurately. Indeed many of the graphs 
given are so small as to make this task extremely 
difficult. Nevertheless, with some sixty to seventy 
papers with their results presented in this form, this 
forms a large quantity of at present unstored data. 

It has been found that in forming potentials of a 
parametric nature (e.g. of Lenn a rd-J one s, Kihara, Morse 
and Buckingham types) the authors rarely give the range 
over which tney regard their potential to be valid. 
Since our prograrr.s require this to be specifically statf.d 
this has meant the choice of some range based on experience 
We have dec i ded on the range 



where r. 



0.9rm < H < 5.0r^ 
'S the internuclear separation at the 
equilibriun ooint. Nevertheless, tn i s choice is sfill 
arbitrary and we feel the whole question to be one which 
would be well worth a much fuller investigation, 
('v) It is sometimes not possible to obtain the values 
of Ue, the potential well depth, which is required for 
storing RKR potentials. This also applies to the value 
of the separated atom energies which is required for 
many numerical ab initio calculations. Tnese missing 
values will have to be added into the databank as and 
when they become ava i I at) I e . 

(v) The most difficult problem which is met in building 
up the data base is still the evaluation of the data; 
that is, estimating the accuracy of the Volues given for 
the potential. The need for some sort of evaluation is 
readily recognised if the user of the system is to be 
able to discriminate between a number of estimates of the 
same potential. However, since in most cases the 
authors of papers do not give any indication of the 
accuracy of their results, it is left to us to provide an 
initia. evaluation. This is based on a number of 
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factors; the method of calculation used and a comparison 
with results from other sources being the most important. 
Often this approacn is adequate but at other times it 
merely constitutes an educated guess. However, with the 
completion of the data base now near, we will soon De 
aole to improve on our present methods of evaluation, 
firstly by carrying out more general and widespread 
comparisons and secondly by employing experts to look at 
the particular potentials and so carry out a more 
realistic assessment of them. 

3.1.5 Summary of present position 

Summing up we can say that the creation of a data 
base of interatomic potentials is now almost complete. 
Once this is done it remains only to ensure that the system 
is kept up to date with new publications. Attention can 
then be turned to those other quantities such as oscillator 
strengths, po I a r i za b i I i t i es and energy levels which are 
to be stored so as to broaden the data bank. 
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i.2 l^otf>ntial representation 



before manipulative operations of any complexity car. 
De carritid out on the stored potentials a smooth represent- 
ation of tner. over the whole range is necessary. The a i n, 
is to bet up a system whereby this curve can be obtained 
from th,^ available data describing the i n termo I ec u I a r 
potential for any state of any diatomic system. 

The available data may be any combination of the 
followiriq four:- 

(1) The asymptotic form as the internuclear separation 
F< tends to zero which is given by 

Z 7_e^ 

V(R) = ^-2 . a , p2 ^ 3 

p 0/13 

where a^ is the united atom energy and Zj, Z^ are the 
atomic numbers concerned. (Buckingham \j_958j ). 

(2) A short-range part, generally fitted by a born-Mayer 
potenti on 

V(R) = Ae '^^ 

the accuracy of which is poor, particular as R increases 

(3) A set of points distributed around the potential 
minimum, some of high accuracy (usually those calcul- 
ated ubinc, an RKR procedure) and some of fairly poor 
accu racy . 



The asymptotic form as R -> oo usually expressed 
e'ther in terms of a Van der Waa I s Coefficient or, 
more generally, as a long range expansion in inverse 
powers of R. 
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As a starting point, it was assumed that i\\e results 
using an RKR procedure would be available and tnese would be 
used to represent tha potential in the intermediate range 
reg i on . 

The first attempts were directed at fitting an analytic 
form(wit[i a maximum of three parameters) to the RKf^ points. 
This would then be joined, by means of other simple para- 
metric functions, on the one hand, to the t)orn-Mayer potent- 
ial, and on the other hand, to the long-range asymptotic 
form. For the first part of this scheme, a program was 
written to fit in turn each of six parametric potentials 
to the RKl-: points, choosing the best or ending the search 
when tne average percentage error in the fit was less than 
I'/b . Almost invariaoly the best fit was given by the Levine 
potent i a I : 

-a.(, f^rr^) 



In connection with the parameters, it may be noted that 

V' = 0 when X = I, i.e. when R = R , and then V(R ) = -t. 

' m' <H 

Of the different forms tried for ir.e Joins, the most 

success f u I we re : 

for the lower join, where b is the Born- Mayer value, and 
for the upper join. 

Each of these contains six variable parameters, which are 
utilised to impose continuity of the potential and its first 
two derivatives at botn ends of the join. 
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The success of these attempts is judged by the degree 
of "smoothness" achieved, a necessary (though hardly suff- 
icient) con.'ition being the absence of turning points 
in the join regions. If the join is not sufficiently smooth 
we might make the trial form more flexible by adding more 
variable parameters. However, in general more parameters 
mean a greater likelihood of oscillations, and it is diff- 
icult to see how the parameters could be chosen expressly 
to eliminate these. The most prooaOle cause of lack of 
smoothness is an essential incompatibility between the 
different segments of data as given, and alterations to 
one or more of the segments may be necessary. 

The position is rather different with respect to the 
two joins. If the RKR data is inconsistent with the 
longranqc asymptotic form, it means that we are applying 
the latter at distances which are too small. We can move 
out the upper end of the join, which was quite arbitrary 
in any cdse. At the lower join, the Born-Mayer and RKR 
potentials may be clearly inconsistent, indeed, in extreme 
cases, they might cross. The Born-Mayer potential must 
be altered in such a case, but we are not free to move in 
the lower end of the join without limit. It would be better 
to make some slight alteration to the Born-Mayer potential 
as a wnole, but because of the difficulty of doing this 
in a meaningful way, a new approach altogether in which 
the born-Mayer potential plays almost a secondary role seems 
advisable. The opportunity is taken at the same time to 
extend the fit down to R = 0, with the aid of what we know 
about the asymptotic form there. 

For the moment, let us retain the parametric potential 
fitted to the RKR points. We seek a parametric form to 
be fitted to the whole region interior to this, and another 
in the whole region exterior to it, thus reducing the number 
of "pieces" in the potential to three. In the interior 
region we might try 
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The significant fact here is that 



^ — e 



has a finite limit as f^— >0. The figure below plots this 
function for several values of "a", though the behaviour 
for small"R" is speculative, because in this region we have 
only the Born-flayer data as a guide. We see that if "a" 
is too small, the subtracted term tends to swamp the original 
potential over too much of the range, whereas if "a" is 
too large, tfie usefulness of having a finite intercept is 
lost. Comparison with the Buckingham expansion shows that 
if "a" is properly related to the united-atom energy, the 
intercept on the V axis is zero, and the term is then 

unnecessary in fitting the residual potential. However, 
this fact is not of much practical use, since: 

(a) the particular residual potential which goes 

through the origin appears to cross the axis at 
I east twice f i rst ; 



V- 



Z,2,C 
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il) united-etom energies are not available for many 
cases of i nterest ; 

(c) even where the procedure is practicaDle (e.g. Li^), 
theoretical expectations were not realised. 
We might do better, therefore, to choose any suitable 
value for "a", and retain amongst the parameters. 

The parameters in the residual potential would be 
chosen with a view to; 

(1) Continuity of V and its derivatives at the lower 
limit of the RKR reg i on ; 

(2) Approximating the residual part of the Born- 
Mayer potential over the appropriate range; 

(3) The general reouirement ^f smoothness. 

Applying the same ideas to the exterior region, we 
might try 




choosing the parameters ior continuity with the RKR potential, 
and for smoothness. 

There are great difficulties in enforcing all of the 
above criteria, particularly that of smoothness. The crit- 
erion of continuity may be dealt with by subtracting 

— e 

I?. 

from the various data over the whole range, and then trying 
to fit something over this who I e range. The residual 
potential is everywhere finite, and the interval can be 
changed to (-1., ♦I) by the transformation of independent 
var i ab I e 

• l*^ - ^"n, probably) 
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We may then fit the transformed data with Chebyshev poly- 
nomials; but the great difficulty will be in reproducing 
the correct long-range asymptotic form when this expansion 
is transformed back to the infinite interval. 

A possible way around this is to make a further 
addition to the potential: 



or 



if next term is Oi^ ^) 



"b" is not to be so large that the residual potential is 
still CO mp arable to V for unsuitably large values of K, 
nor so small that V is seriously distorted in the region 
of the minimum. We may now cut off the derived potential 
at a value of R for which it is very much less (in aosolute 
value) than V, apply Chebyshev polynomials in This finite 
interval, and add back the two terms after the fit has been 
made . 

We have so far outlined three methods of approach 
which deser/e further investigation: 

(1) a fit in three pieces, i.e. interior region, RKR region, 
exter i or reg i on . 

(2) a fit in one pie^e, consisting of 



plus Chebyshev polynomials over an infinite interval 
(3) a fit in one piece, consisting of 



plus Chebyshev polynomials over a finite inter va I. 



A fourth possibility, going in the opposite direction, 
is to adapt the Levine potential, which is so successful 
in representing the RKR points, so that it has the correct 
asymptotic form at botn ends of the range. As it stands, 
it goes to R as R^O, and falls off exponentially for 
large R. If we replace the previous definition of X by 



we have the required asymptotic forms. Moreover we have 
taken trouble to preserve the interpretation of R as the 



value of P for which x = I, which is important if initial 

estimates of the parameter R are to be accurate. o< 

m 

is chosen to give the correct Dehdviour for small R, and 
is seen to depend on "a". The least squares fit to the 
RKR points is now performed with respect to the parameters 

L, R and the curvature at R • The expression for the 

m m 

curvature depends also on both "a" ando(, and so, for a 
given curvature, we must find "a" and ^ by solving this 
simultaneously with the condition imposed by the short- 
range asymptotic form. On the other hand, the constant <r 
may vary within quite wide limits without invalidating 
the asymptotic form. In assignino it, we may consider 

(a) conditions similar to those imposed on "b" on the 
p rev i ous page; 

— 7 — ft 

(b) tf)e known coefficients of R or R ; 

(c) making it a parameter in the fit to the RKR points. 

To summar ize, the overall purpose of these approaches 
to the potential representation problem has been to find 
some modified form of the potential curve which permits 
an accurate and realistic curve fit to be carried out. 
As yet this goal has not been achieved. To cover the 




m 
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possibility of this not being done successfully a hcss 
suotle form of potential representation based on an 
interpolation procedure will have to be implemented. 
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3.3 Retrieval and Manipulation 
3 . 3 J . I n t roduct i on 

To understand more fully the characteristics of the 
retrieval system and to appreciate the changes and 
improvements made over the past year, the reader is 
referred to the previous annual report. It is not the 
intention here to report what has been said in it 
except Where it is felt necessary to clarify differences 
that have taken place during the interim period. The 
system's philosophy remains uncnanged and is ably 
illustrated in the following extract: 
"Generally when a scientist retrieves a particular 
potential he is not so much interested in the potential 
for its own sake, but rather as a means to an end. 
Usually he will use it to calculate parameters defining 
one of the physical or chemical properties of matter. 
An Obvious step is therefore to build into the system 
the programs which will enable him to carry out such 
ca I cu I at i on s . " 

With this objective in mind the system throughout the 
past year has incorporated into its conversational mode 
more user control of -^he data involved and more emphasis 
has been put on the system's manipulation capabilities. 

3.3.-. Retrieval from the user's viewpoint 

We now describe the present state of the system. 
It differs from that of a year ago in that then the user 
had no control over which potentials were used in the 
manipulation facilitit?s; now the user can exercise 
direct control if he wishe?. For example, he could 
manipulate with a single stored potential when before he 
had to accept the system's curve fit. 

At the start of a search preliminary information is 
given to the user if he is not familiar with the system. 
The user then specifies a pair of atoms. Unless he is 
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interested in only the ground state, which he indicates 
Dy following the atomic symbols with an "X", tf.e ystem 
gives h i n a numbered list of the states for which 
potentials are stored as well as the number of potentials 
for each state. The user identifies by number hib choice 
of state. If there are no states for the user's atom- 
pair, he is given the option of the Born-Mayer (b-M) short 
range potential; the B-M is only available for a qround 
state and is not possible if either atom is hydrogen, in 
which case he may then try with another atom-pdir. After 
choosing a state the user then specifies the energy and 
length units in which he wants his potentials and in 
return he is given two options: 

(i) the ability to choose any of the stored 

potent i a I s 

(ii) the generation of a potential over trie whole 
range obtained by fitting a smooth curve to the 
stored potent i a I s • 

For the first option the user is given a numbered 
list of the potentials - each potential consists of the 
potential type, the range over which the potential is 
valid and whether this range is short, i n t e rnec< i ^ te or 
long. 

The user is now invited to select f:om the potentials 
displayed. This selection process is done by choosing 
one potential at a time. Subsequent to picking a potential 
by its corresponding number, the user is offered any 
combination of the following three facilities; 

(A) A tabular listing of the potential points 

(B) the potential points stored as part of those to 
be used for further manipulation purposes, and 
currently this is a cur/e fit for the potential 
over the whole range 

(C) an off-line graph plot of the potential. 

If a listing is required the user is asked for the 
number of points he wants and the range over which he 
wants them and whether he wants a short or comprehensive 
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description of the potential. 

For (b) and (C) the appropriate points are written 
to pre-determi ned areas of the disc. In the latter case 
the values are later retrieved in the off-lint graph 
plotting program, while in the former tney are used in the 
manipulation program. .f the user asks for both (b) and 
(C) fhe two operations ore carried out independently. 

if the user requests any off-line graphs during the 
process of this search then he must give the system his 
name and address for later identification of the graphs. 

3^.3 Curve-fit of individual potentials 

For facility (b) above and the earlier option (,|) 
the respective individual potential fits are transferred 
TO a curve-fitting packac^e which fits a smooth potential 
tnro'jgh all the points included^ reducing the potential 
tc an equation with four pa ra meters which should be valid 
over the whole range. As was described |n Section 2 the 
presen- curve fitting program is not enlirely satisfactory 
and during the past year much effort has gone into an 
attempt to help to produce a smoother representation over 
■♦"ne whole range. 

When the smooth representation of the potential has 
been determined, the system offers the user the following: 

( a ) a tabu I ar I i st i ng 

(b) further manipulation possibilities. 

The tabular listing is similar to that outlined above 
for the individual stored potentials; the difference now 
being that the user is not confined to asking for a numoer 
of point:, within a specified range but theoretically he 
can choose any range from 0 to 00 . Having specified his 
range of interest and number of points he can have them 
listed and he can obtain a variety of off-line graphs. 
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Firstly, he can have a curve fit of tr^e potential over 
any part of the range as • i I I ustrate'j in fig. I ano marKed 

Q ■ 




Fig. I 



He can have the same again but this time with all the 
potentials usea in finding (T) as illustrated in fig. 2* 
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'j . L Retrieval fron the systems viewpoint 

The retrieval system works in a t i me- s ha r i n q , batch- 
procur^sing multi-access environment and as such there is 
a core restriction (at present 18k {24-Dit) words) 
available to individual on-line programs. It is expected 
tnat a new IV06S ICL computer (oresently oeing installed) 
will be onerational in the coming s ummer months. The 
present machine (a 1907 ICL computer) will then be used 
mainly for multi-access work and so core re:3 1 r i c t i o n s will 
be greatly improved. Because of this core limitation it 
is necessary for the whole retrieval package to consist 
of four individual programs. Only one of these programs 
can be in core at any one time and so at an appropriate 
point each (with the aid of a system command given by the 
user) has to activate the next program. Information 
between one program and another is conveyed via a 
communications file which is held on disc as a temporary 
storing place for the reauired data. The four programs 
i n vo I ved are: 

CHAT: the controlling segment which outputs states 

for a user's atom-pair, as well as transferring 
to another program, FITS, the user's choice of 
state and the starting addresses of the 
associated potentials arid units required, via 
the communications file, 

f^lTS: This program outputs the stored potentials for 

a chosen state as we i I as transferring to 
^CURV the selected potentials for curve 
fitting the potential over the whole range and 
s to ring values for individual graphs. 
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CURV: This program does the curve f i tt i r j of tne 

potentials transferred from program FITS as 
well as transferring the parameter equation 
values calculated for the curve over the whole 
range to program DtFL when further manipulation 
is req u i red • 

DLFL: This program uses the four parameters trans- 

ferred from CURV to determine tne deflection 
angle for any energy and impact parameter given 
by the user. 



^•'^.b Program CHAT 

The first thing performed in # CHAT is to discover 
if the user is starting his search or returning from 
another program. To determine this we call 5/R OPEN 
where we open our communication f 1 le INDXABdFOI 12 and make 
the appronriai'^ check. If the user is returning to CHA^ 
we must reassign values to his "atom-pair", "statenamo" 
and so on. As indicated earlier whenever we go frorr 
one progr^^m to another, we must write this information 
(atom-pair, etc,) to disc and then read it down again 
whenever we have entered the required program. If v^e 
are returning from ^ CURV or 4^ FITS, the name of our 
data file is stored on the communication file. However, 
assuming the user is just starting his search, we call 
S/R INITIAL, In S/R INITIAL we ask the user if he is 
familiar ^ith the system and, if he Is not, give him 
preliminary information about search strategy and so 
forth. In INiTIAL we may also set switches to give 
intermediate values in various S/Rs; these are used to 
help the systems designers to detect errors, INITIAL 
itself calls S/R PRELIM, which reads from the data file 
information about potentials and addresses on disc of 
atom-pairs. We now return to the MASTLR. 
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It is here that the search really oegins. The 
user is asked to type his "Atom-Pair?". The address of 
this question within the context of 4f ChAT is stored 
on disc by calling a plan S/R STORE. This is to allow 
the user the facility of restarting his search at any 
time by simply typing "A" for rjny user response. The 
user's choice of atom-pair is read in S/R AT0MN05. 
ATOMNOS checks if the chemical symbols are valid and 
determines the corresponding atomic numbers frorr. a 
preset list. It also calls S/R NAMES which checks for 

♦ or etc., for a user may type HH*, H-h; he may 
in fact follow his atom-pair with an "X«' which tells the 
system that he only wants to deal with the ground-state, 
whereupon a pointer is set to indicate this. 

dack in the MASTER another pointer is set to 
indicate if either atom is hydrogen. We next call 
S/R SEARCHI which determines the position on disc of the 
1st state for these atoms. If there are no states 
stored for a particular atom-pair we give the user the 
option of the BORN-MAYER potential; this potential can 
always be generated by substituting in a set equation the 
values of certain parameters. Assuming there are states 
stored tf.e program calls 5/R SEARCH2, which lists out all 
the states in the databank for this atom-pair, with the 
nur.er of potentials for oach state. The user is then 
asked for his choice of state. After making his choice 
the program determines the address on disc of the first 
potential for this state and returns to the MASTER. 
If he does not want any of these states, he may try for 
another atom-pair. Next we ask the user for the energy 
and lenqtf> units in which he wants his data. 

The cata on the disc is stored in atomic units and 
the factors necessary to convert the values to the users 
units are found in the same preset list as for the 
atomic numbers. This is done in S/R UNITER. In 
S/R CLOSEN tho communications file is now updated with 
the atom-pair, statename, units, etc. and all files 
closed. The name of the data file is also stored in 
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the communications file. Then control is passed To 
anothfer program in the system (# FITS). 

On returning to #• CHAT from some other program in 
the system the relevant information is read from the 
communications file and the data file opened. The user 
may then try another state for the same pair of atoms 
and have the states listed again if he wants. Failing 
ttiis he nay then try another pair of atoms or cease- 
execution of the program. When given the choice of 
another pair of atoms, the user is first asked "Other 
Atoms?". In reply he need not type "Yes" or "No" but 
may type immediately the symbols for his atom-pair. 
Similarly in reply to "More States?" he may type the 
number of the state he wants. 

3.3.6 Program FITS 

As # CHAT retrieves states for a chosen atom-pair, 
so # FITS retrieves potentials for a chosen state. 
Trio first thing done is to reassign values to the 
atom-pair, statename and number of potentials for the 
state chosen. This is done in S/R OPtNl, which also 
calls S/R PRELIM to read information about the potentials. 
Furthermore, it calls S/R SETGRAPH which sets up the 
graph and manipulation buckets and determines if there is 
room available in the graph area. The user .s then asked 
"All or Dest". Here he has the choice:- 

ALL:- f,e may nave all the potentials listed and 

ctioose whichever he wants as described below. 

BEST:- Here all the potential values are written to disc 
for use in the curve fitting program. m this 
case the user has lost control of individual 
potentials though in CURV he may obtain a 
graph of all the potentials together with the 
f i tted potent i a I , 

The potentials are listed in S/R DATARLADI and the user is 

asked to choose one. If none of the potentials listed 
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ib suitacyle to the user, ne may return to # Ch^'T to try 
another btate for his atom-pair or indeed a different 
atom-pair. Provided the user chooses a potential^ he 
is then asked "LIST^ STORt, GRAPh?" which means:- 

LIST:- does he want the values for this potential 
I i sted? 

STORt:- does he want these values stored for further 

manipulation? 
GkAPH:- does he want a qraph of these values? 

The user response to tnese takes the form of tfiree integers, 
each taking tne value 0 or I. The latter indicates that 
the user is interested in the appropriate option. The 
user can have any combiriation of these three possibilities. 
vVe now call S/P DATARLADI which this time will output 
the users chosen potential. DATAKLADI will itself call 
5/R At3RAMAM3 to output trie short uorn-Mayer potential, 
b/R UATAfJUM to output tne intermediate RKH potential, or 
:;/R RLTCo to output the long Ct> potential. In each of 
these S/Rs the values will be listed, storea for 
manipulation, or stored for a graph, depending on the 
user's responses to the options above* Furtnermore 
DATARLAUI will call S/R jECOUL if the user rea u i re s a 
descriplion of the potential; DLCODt will give, for 
example, author and reference of the paper from which the 
data was extracted, the amount of information given 
depending on the user's answer to "SHORT?". DATARLAUI 
also calls S/R ERROROUT which ^jives what is considered 
to be the relative or absolute error of tht? data. 

back in the MASTER we ask the user if he wants 
another potential (Yes, No or actual number of potential). 
If he types "Yes" or the actual number we go back for 
another search. If the user does not want another 
potential, we then check if he wants a graph, etc. 
Should he want a graph we call S/R STORENAME, whicr. will 
ask for his name and address (to identify the graph) and 
write this information to disc. If he wants the values 
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stored for manipulation, we call S/R CLOSENI to write tne 
communication Ducket to disc and then prompt h i rr« to 
type "£RUN,CURV"; otherwise we again call CLOSENI but 
this time ask him to type "£RUN,CHAT". 

Finally a word about LIST, STORE and GRAPH in the 
context just described. A user can always have the 
values listed out. He need not limit himself to having 
a graph of some of the potentials or storing them all for 
m^.nipulation; he could for example get a graph of I 
potential and store 3 potentials for manipulation. At 
the moment further manipulation can be accomplished in 
the calculation of a smooth potential over the whole 
range and then the determination of deflection angles. 
More description of these facilities will be given in 
the following sections. 



3.3.7 Program CURV 



As in ^CHAT andr^FITS the communications file is 
opened and a check is made of the number of users with 
data already stored for graphs - there is a maximum of 4 
put on the number available jt any one time. We then 
call S/R READDOWN to read from disc the values stored for 
manipulation. Next we call S/R FITS, which curvefits 
the values using S/R MA02A; this gives us four 
parameters which, when substituted in an equation, 
provides values for the potential (V) for every value 
over the range (R) from 0 to oQ <Fig. I). 




*S/R MA02A is an Atlas routine which solves a set 
linear simultaneous equations. 
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It is then necessary to know if the user wants values 
listed in a certa'in range or if he is interested in 
deflection angles. In the latter case the u:er is 
prompted to pass control to ^ DEFL. If, however, he 
wants a list of values, he is asked for the range he 
requires and the number of points in this range. The 
values for the potential are then generated at these 
points and the results listed out. Having studied the 
values the user may reply in three ways to the question 
"GRAPH?". 

1. If the user decides that the values are unsatisfactory, 
he should type 0. 

2. If he types I, he will obtain graphs of the stored 
values and the curvefitted points on the one frame. 
(See Fig. 2 ) 

3. If he types 2, he will obtain a graph of the curve- 
fitted potential alone. (See F'g. 3) 




'"'9' 2 Fig. 3 

In case 2 the curvefitted values are first written 
to the user work area of 1he d ' sc where the stored values 
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are already, then all the values are transferred to the 
graph area. In case 3 the curvefitted values overwrite 
the stored values and are then transferred to the graph 
area . 

Whatever nis reply the user may return to ^ CHAT 
to start a new search. 

3.3.b Program DbFL 

To obtain a deflejction angle it is necessary to have 
a smooth curve for the potential over the whole range. 
The potential can be defined by four parameters 
determined in CURV; these parameters are used in an 
equation to give a value for the potential at any point. 

The user is first asked for the energy he wants. 
From this it can be de+ermined if there is no orbiting 
and the critical impact parameter can be found. These 
results are given to the user an<J he is then asked for 
his choice of impact parameter. The deflection angle 
is then found and output to the user. He may vary his 
energy and impact parameter value at will. A particular 
use of this facility is illustrated in the next section. 
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3.3.9 Further manipulation 



With a smooth representation over the whole range we 
can obtain quantities like the transport properties which 
are obtained from the collision integrals (see "Molecular 
Theory of Gases and Liquids" oy h i rschf e I der , Curtiss and 
Bi rd ) . 



where T = temperature 

JL^- reduced mass of the two interacting systems 
= Boltzmann's constant ^ 

The collision crossection i^(E) depends on the initial 
relative energy L and is given by 



where b is the impact parameter and'^is the classical 
def I ect i on ano I e 



in whicn is the outermost zero of 



At present the system includes the program to 
calculate deflection angles, with the user simply providing 
values for E and b . The nevt "tep will be to include 
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program to calculate the cross-sections for values of 
E, and then the collision integral program. These two 
programs will have to be run off-line, the data being set 
up on-line. 

Note: Further numerical methods are discussed in 

The Journal of Chemical Physics, 4 I , pp , 3560-3568, ( I 964 ) . 
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3.4 A Typical User-System Conversation 

We give here a listing of a conversation between 
an experienced user and the retrieval and manipulation 
system. The printout is exactly as obtained on a 
teletype. Lines such as 
ChAT: ATOM PA I R? 

or 

MCS: CHAT (CORE: 13312) 
show output to the user from the programs ChAT and MCS 
(the time-sharing control program supervising the 
execution of CHAT) while a line such as 
: K K 

shows information input by the user. 

At a number of points we have added brief explanations 
of the user's replies. These are given at the rightfiand 
side in brackets opposite the relevant reply. In 
particular in these explanations we refer to graphs which 
are obtained from the user's choices. These graphs 
numbered one to four ^re given immediately at the end of 
this sect i on . 



- 34 - 



Con versat ion: 



MCS 

CHAT 

CHAT 

CHAT 

CHAT 

CHAT 

fICS 
F I T5 

F I TS 
Fl TS 
F I TS 
Fi TS 
F I TS 
Fl TS 
F I TS 

Fl TS 
F I TS 

Fl TS 

Fl TS 
Fl TS 
F I TS 

FITS 
F I TS 
Fl TS 
Fl TS 
Fl TS 
Fl TS 



£HUN, CHAT 

CHAT (CORE: I II 36) 

INTERATOMIC POTLNTIALb 

FAMI L I AR W I TH SYSTEM? - 

YES 

ATOM PAIR? 
LI K X 

ENERGY AfJD LETJCTH UNITS? 
AU AU 

TYPE "£RUN,FITS" 
£RUN, F I TS 

FITS (CORE: 14336) 

ALL OR BEST? 

ALL 



"YES" OR "NO"? 



System comments like 
this will be e I i n i rated 
5 ho rt I y 



User i s ask i ng for all 
potentials stored for 
this state 



POTENTIALS STORED FOR LI K X 
NO TYPE RANGE (AU) 

1 VAN DER WAAL'S COEFF 10.9 to 99.0 (LONGRANGE) 

(97) 

2 BORN-MAYER 1.5 TO 3.0 (SHOf^T) 
99 OTHER FITS 

WH I CH? 



|see Graph ij 



ERROR (RELATIVE)-- \0.0% 
LIST, STORE OR GRAPH? 
I 0 I 

SHORT DESCRI PTION? 
YES 

V = -C6/R««6 
WITH R IN AU, Ce = 0.2290E .04 GIVES V IN AU 
NO. OF POINTS AND RANGE? 
14 20.4 31.6 



R( AU) 
2 1.15 
2 1.18 
22.64 
23.39 
24.13 



V(AU) 
-0.256 I E-04 
-0.2C80E-04 
-0. I 70! E-04 
-0. I 400E-04 
-0. I I 59C-04 
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F I TS 
F I TS 
F I T5 
F I TS 
F I T5 
F I TS 
F I TS 
F I TS 
n TS 
F I TS 

F I TS 



F I TS 

HCS 
'JHAT 

CHA r 

CHAT 
CHAT 
CHAT 
ChAT 
CHAT 
CHAT 

CHAT 

CHAT 

MCS 



F I TS 
FITS 
Fl TS 
FITS 
Fl TS 



24 .88 

25 .63 
26.37 
27.12 
27.87 
28.6 1 
29 .36 
30. I I 
30.85 

MORE POTENTIALS? 

NO 

NAME, ADDRESS ON 2 

F^O'ilN MCDOfJOUGH 

78 MALONE I'OAD 

TYPE "f. RUN, CHAT" 

£PLiri,ChAT 

CH.\T (CCr^L : II I 36 

MOF^E STATF S? 

A 



-0.9655E-05 
-0.a0o5E-05 
-0.6b05E-05 
■0.5756E-05 
•0.4d90E-C5 
•0.4 1 73E-05 
■0. 35 75 E -05 
■G. 3075E-05 
-0. 2 655 E -05 



LINES 



User wants to start 
a new search 



ATOM PAN-? 
K K 

STATES STOI'^ED FOR K - K 

NC. STATE NO. OF POTENTIALS 

1 K2 X I SIGMA*(n) 4 

2 K2 H I PI (U) I 
99 ANY OTHER STATE 

WH I CH? 
I 

ENERGY AND LENGTH UNITS? 
AU AU 

TYPE "£RUN,F!TS" 
£RUN, F ITS 

FITS (CORE: 14336) 

ALL OR BEST? 

ALL 

POTENTIALS STORED FOR K2 X I SIGMA*(G) 

NO. TYPE RANGE (AU) 

1 VAN DhR WAAL'S COEFF i2.0 to 99.0 (LONGRAiJGE) 

123 

2 RKR 6.04 to 9.58 ( I NTtRMLu I ATE ) 



ERIC 
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F 1 TS : 


( 55 ) 










fit:.: 


3 ^Kf< t.2 4 


to 9 - 1 ^ ( 1 N T t RM f n 1 A T 1- ^ 








FITS: 


( • ) 










FITS: 


4 dORN-tlAYEK I . 5 


to 3.5 (SHORT) 








FITS: 


99 OTIiLH FITS 










F 1 TS : 


Wh 1 Ch? 
1 










r 1 1 b : 


h KtVJK ( Kt LA Tl V L ) : - 


\0.00% 








FITS: 


LI ST, STORE OR GRAPH 












0 1 0 


User is asking for h i s 












particular potential to 












be stort^ri 






f 


FITS: 


NO. OF POINTS AND RANGE? 












10 1 2 . r; 1-1.0 










FITS: 


MORE POTENTIALS? 












? 










FITS: 


ERROR( 1 AT 1 VE ) : - 


1.0?; 








F 1 TS: 


LI ST, STORE OR Gl^APH? 












0 1 0 










F 1 TS : 


MORE POTENTIALS? 












NO 










F 1 TS : 


TYPE "£RUN,CURV" 












ER'JfJ, CUf^V 










I'l L b : 


CU K V ( CORE : 1 08o 0 ) 










CUF^/ : 


LIST OR UEFLLCTION ANGLE? 












1 0 


User is asking for list of 












points firted to potentials 












chosen by him 








CURV : 


NO. OF POINTS AND RANGE? 










2 0 4 . L' 12.6 










CUf^y : 


FIT FOR 1.2 X 1 SIGMA* 


(G) 








CURV: 


R(AU) viAU) 










CUPV: 


0.I008E 


00 








CURV: 


5.21 0.442IE 


-01 








CUPV: 


5.621 0.I526E 


-0 1 








CURV: 


6.032 0.I065E 


-02 








CURV: 


C.442 -0.5245E 


-02 








CURV: 


6 , b53 -n 747ftF 


-07 








CURV: 


7.?63 -0.7564E 


-02 






J 








^ERLC 
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CURV : 


7.674 


-0.6790E 




Cl>RV: 


'J . 08 •• 


-O.3707L 




CUPV: 


,49b 


-0.46 1 3E 


-c? 


CuRV: 


0 .901 


-0.3643E 


-02 


CUPV: 


. 3 1 C 


-0.28401 


-02 


CJ F^V: 


9 . 72 (, 


-O.2204L 






1 1 4 


-0. 1 7 1 8L 


-02 


CUF^V: 


10.55 


-C. 1 355E 


-02 


UjRV: 


lw.96 


-0. 1 09 1 C 


-02 


CUf-^V: 


11.37 


- 0 . 9 0 1 b h 


-03 


LiJR V : 


11.78 


-0. 770 1 E 


-03 


CURV : 


i ? . 1 9 


-0.C807L 


-03 


CURV : 


1 ? .60 


-0.622 1 F 


-03 


CUf^V: 


TRAPH? 








1 


User is 


ki 






of the C'l 


0 S(; 






fitted va 


1 Ue 


CURV: 


.iAMt , ADi;(- f_ S5 ON 2 


LI NES 






jaml:; i clla:j 







CUKV 

MCS 
CHAT 

CHAT 

Cr,/\T 

MCS 
F I TS 

FIT': 
F I T5 
F ITS 

FITS; 
FITS; 



nraph both 
iTials and the 



O.U.i!. ^ 

TYPE "£KU;i,ChAT" 
fKUN, ChAT 
LHAT(COr't : I I I 3C ) 
TORE :;TATtS? 
I 

ENERGY AND LLNOTH UNITS? 
AU AU 

TYPE "£RUN,FITS" 

n^UN, F I IS 

r ITS (CORE: 14 336) 

ALL OR f.EST? 
/ LL 

POTENTIALS STORED FOR K2 X 

NO. TYPE 

I VAfJ DER WAAL'S COEFF 

(123) 
RKR 



SIGMA ♦ (G) 
RANGE 
12.0 TO 



99.0 



( 



RANGE ) 



-.04 TO 9.58 
( INTERMEDI A-Tt) 



- 3 -J 



FITS; 
F I TS: 

Fl TS 



MCS 
CURV: 



C IRV 



CURV 
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RKR 

( I ) 



FITS: 


4 BORN-MAYLR 




1 . 


FITS: 


'j9 OThLR FiTS 






FITS: 


wH 1 CM? 
1 






• 

FITS: 


F RROR(KELAT i VE ) : - 




1 0 


r 1 ^S : 


L 1 ST, STORE OR GRAPH? 




• 


1 1 0 






FITS: 


^HORT DtSCR 1 PT 1 ON 






• 


YE :> 






i TS : 


V = -C6/P**6 






1- ' T S : 


I'lTH R IN AU,Cc = 


0.3820E 04 


GIVE 


FITS: 


'lO. OF POINTS AND 


RANGE 




• 


10 12.0 13.0 






FITS: 


P( AU 


V ( AU) 




FITS: 


1 2 .09 


-0. 1 223E 


-02 


FITS: 


! 2 . 1 a 


-0. 1 IC9L 


-02 


F 1 TS : 


12.27 


- C . 1 1 1 8 E 


-C2 


FITS: 


!2.36 


-0. 1 070E 


-02 


. TS : 


1 2 .4^ 


-0. 1 02 4 E 


-02 


FITS: 


12.55 


-0.9798E 


-03 


FiTS: 


12.64 


-0.9383E 


-03 


FITS: 


1 2 .73 


-0."988E 


-03 


FITS: 


12 .82 


-0.861 2E 


-03 


F .TS: 


12.91 


-0.8254E 


-0? 


FITS: 


f'ORE POTENTIAL'? 








fJO 






F 1 TS : 


TYPE "ERUN.CURV" 








f t'UN, CURV 







.24 TO 9.15 

( INTLRMEUIATE) 

I .5 TO 3.5 (SHORT) 



0.00^ 



CURV(CORE: 10880) 

LIST OR DEFLECT I ON ANGLE? 

0 0 

NO. OF POINTS AND RANGE 

'1 0 f , . 0 13.0 

GRAPH? 

I 



Use- wants graph without 
exam i n i ng points. 
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v> V • 


r AMP Afir^DtCC r \f i 1 I i».rr 

'^nMt, AUlJKLbo (Mi Z i l.'JrS 






1 n k-i 1 P P I ^ 1 






OMPUTLR CENTRC 






PE "£RUr],CHAT" 










.ire. 


>^HAi (LuKb: 11136) 




{ ^ AT • 


1 






t 

I. )itK(;T ANu LbNUTH UNITS? 






All AM 




^ M I t 


1 tPE "£ run , F 1 T5 " 






r I> M M C 1 T C 






'Mb ( LORE : 14336 ) 




F I • 


All ri D CJ t" C T 0 






r-' f T 

D L D 1 




F 1 • 

r 1 I J , 


TVDC llruiiM oiir>i/ll 

1 T rt i-KUrj,LURv 
















r I ' f < V/ • 


Lioi UK UtrLbLI ION ANGLE? 






1 0 




CUf^V : 


NO. OF POINTS AND RANGE? 






16 4.0 9.0 




CURV : 


FIT FOR K? X ; SIGMA*(G) 




CU RV : 


f-^(AU) v(AU) 


CURV : 


4.000 0.2666E 


00 


u r\ ' 


'5-333 O.I75iE 


00 


o r\ V • 


4.667 O.II3dE 


00 


u r\ V • 


5.000 0.72lbE- 


■0 1 


rii RV • 


^•333 0.4363E- 


•01 


n IR\/ • 


5.667 0.2403E- 


•0! 


riiRv • 

v-f u r\ V , 


6.000 0.I060E- 


•0! 


U r\ V . 


^•333 O.I485E- 


•02 


PI! RV • 


C.667 -0.46I3F- 


02 


( UR V • 


7.000 -0.d59CE- 


02 


CUR V • 


7.333 -0.II07E- 


01 


LUK V : 


7.367 -0.I250E- 


01 


CURV: 


8.000 -0.I320E- 


01 


CURV: 


0.333 -0.I340E- 


01 


CURV: 


b.667 -O.I325E- 


01 


CURV: 


•^.000 -0.I286E- 


01 



User wants syster- to 
obta i n oebt fit to 
a I I storea potentials 
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CUPV 



CURV 



CUPV 

MCS 
ChAT 

CfiAT 

CHAT 

ChAT 

;ics 

F I TS 

F I T£ 

MCS 
CiJRV 

CURV 

:iCS 
Uf.FL 
DLFL 
Dl FL 

Uf FL 
DLFL 
DLFL 
ULFL 

ULFL 
DLFL 



User wants graph of 
H tted potential alone 
3ee Graph 4. 



GPAPH? 

IJAMl, ADDRLSS on 2 LINES 
GERRY -iCGLINChL'Y 
ANDFPSONSTOV.'N 
TYPE "f:RUN,CHAT" 
£RUN, CHAT 

CHAT (CORE : I II 3C) 

.'iORE STATES? 

A 

AT on PAIR? 

K KR X 

tNEPr Y A'iU LENGTH Uii I TS? 
AU AU 

TYPE "fRUN,Fn<:" 
i.RUN, F I TS 

FITS (CORE: 14 336) 

ALL OR BEST? 

BEST 

TYPE "i.RUN,CUPV" 

RUN, CURV 
CURV (CORE: I08a'';) 
LIST OR DEFLECT I OtJ ANGLE? 

0 I 

TYPE "£RUN,CEFL" 
f. RUN, Of FL 

DEFL (COPE: 9? 16) 

UNITS TO BE USED: 

LENGTH IN AU AND ENERGY IN AU 

REOUIKED ENERGY, E? 

1 .234 

ORBITING OCCURS IN THE RANGt 1.14005 (AU) TO 
l.32l?3 (AU) 

CRITICAL !t;PACT PARAMETER = 0.336 (AU) 

II1PACT PARAMLTER, b? 

0.94(,4 

FOR E=|.?34 (AU) AND 8=0.946 (AU), 
DEFLECTION ANGLE = 0.I425F 01 (RADIANS) 



ERIC 



- /II - 



Of FL 

ULFL 
Dt F L 
Uf PL 

DEFL 
DEFL 

'CI FL 

U^FL 

FL 

iJt f L 

DLFL 
OL FL 
DLFL 

DEFL 

DLFL 



/^•'JOT.'M.' I'^PACT PARAMETER, B? 



? .86'^ 

FOP f =1.234 (AU) AND 3 = 2.865 (ALJ), 

HLFLLCTION ANHLE = 0.C290E 00 r"hC\M:Z) 

A.iOTMfP |.'1"ACT PARAMETER, B? 

3.49b 

FOR F=l.234 (AU) AND b=3.498 (AU), 
l}EFLECTION ANHLE 0.4632E 00 CMUIA'JS) 
ANOThEF-: riPACT PARAflLTER, B? 

:jc 

A DIFFf PETIT ENERGY, L? 
'.18:' 

'■;^lil?INn OCCURS in the range O. I UOO'j (AU) TO 

i .32 I 53 ( AU ) 

IMPACT PARAMETER, B? 

0.87G 

F("R E=l.lo5 (AU) AND B = 0.376 (AU), 
LH.FLLC.TION ANGLE = 0.I539L 01 (RADIANS) 
ANOThtH I'IPACT PARAMETER, B? 
.803 

FUR E=l.|85 (AU) AND H=5.803 (AU), 
OEFLFCTION ANGLE = 0.II49L 00 (RADIANS) 



The next search is for a different potential 



DEFL 

Dt F'. 
DEFL 
Ur. F L 

DEFL 
DLFL 
DLFL 

ULFL 



0.000 (AU) 



PEOUIF-fED ENERGY, E? 
0.3825 

"NO ORB I Tl NG" 
CRITICAL IMPACT PARAMETER 
IMPACT PARAMETER, B? 
I .007 

FOR E=0.3825 (AU) AND B«l.007 (AU), 
DEFLECTION ANGLE = 0.3I43E-08 (RADIANS; 
ANOTHER IMPACT PARAMETER, B? 

NO 

A DIFFERENi ENERGY, E? 
I G . 14 
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'5,b M i see I I aneouc Developments 
3»5.l Lditing program 

A program named CI24 has been written to enable the 
potential data once stored to be altered. It is 
envisaged that th'is program will be needed in three 
different situations. 

Firstly, alterations may be required to correct 
mistakes inadvertently made when the data was first 
stored. The program will in particular have to allow 
for channes in potential values as well as in titles, 
references, and the various codes used to otscribe the 
type and range of the potential. Secondly, the 
assessment of the accuracy and range of validity of a 
particular potential by a consultant expert will in 
many cases differ from our original evaluation and so 
the capaoility to effect this change will be needed. 
Finally, it may be necessary from time to time to 
replace some information already stored. For example, 
more accurate potential well depths may become 
available, or additional relevant information might be 
added to that already stored. The present program 
allows for the possibility of all these changes. 

The actual editing is carried out on-line and a 
number of examples of how this is done are given at 
the end of this section. 

3.5.2 Checking program DI23 

This program writes out the list of potentials of 
the same typu for pairs of atoms. It enables an easy 
comparison to be made of all the values stored for a 
particular potential describing a particular diatomic 
i n teract ion . 
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The use n.ide of this program will De twofold. 
Firstly, by comparing one stored potential for a system 
with anotrjer for the same system, a check for possible 
errors in extraction and storing can Oe made. Secondly, 
it will present the potential data in a form which can be 
easily evaluated by our outside consultant experts, 

experience has shown the necessity of a careful 
check on the data stored and with the database of 
potentials now almost complete this can now be carried 
out with the use of this program. 
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Example of Numerical Data Editinc) Program CI74 

This program alters or deletes complete potentials. 
Complete potentials refer to statelist, reference record 
and potential record. If more than one so^tion is 
being changed at any time, they must be ta^er, in the 
f o I lowing order : - 

I . State I i St 

2. Reference Record 

3. Potential Record 

The potential record is made up of five small 
sections and the program considers each one in turn and 
makes the appropriate alterations. If the potential 
record has been found to be useless, then the complete 
potential is eliminated. 

We here give a typical conversation between the 
user and the program CI24 when we, for example, make the 
following changes:- 

1. Eliminate a potential 

2. Change a state name 

3. Change a reference 

4. Change an actual potential value. 
Lines such as 

CI24: POTENTIAL ADDRESS 

or 

MCS: CI24 (CORE: 12032) 
show the output rn the teletype to the user from the 
prog rams C I 24 and MCS (the 1 1 me^shar i ng con tro I p rogram 
supervising the execution of C124) respectively, while 
a line I i ke 

: 51947 

shows Information input by the user. 
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: £RUN,CI24 
iXS: CI 24 (CORE: 12052) 
CI24: POTENTIAL ADDRESS? 

CI 24: LLIMI-.ATE A POTENTIAL? 
: YtS 

CI24: (I) BUCKET NO OF STATE? (2) STATE NO? (3) PREVIOUS 
POTENTIAL ADDRESS? (ThIS IS ZERO IF FIRST POTENTIAL 
i:* STRING IS BEING ELIMINATED) (4) NEXT POTENTIAL 
ADDRESS? (THIS IS ZERO IF LAST POTENTIAL IN STRiNG 
IS BEING ELIMINATED) 
: V8 I 51895 52575 

CI24: Change more potentials? 

: YES 

CI24: POTENTIAL ADDRESS? 

: 49073 
C'24: ELIMINATE A POTENTIAL? 

: NO 

CI24: vanCH TYPE OF ALTERATION? GIVE NUMBER ONLY (CHANGES 
TO EACH COMPLETE POTENTIAL MUST BE IN THE ORDER 
GIVEN) (1) STATELIST (2) REFERENCE RECORD 
(3) POTENTIAL RECORD 



CI24: .SUCKLT 'JO? STATE NO? 

: 127 I 

CI 24: HIM STATE NAME? 

: HG HE X 

CI 24: HC HE X 

CI24: MORE ALTERATIONS TO SAME POTENTIAL? 

: YES 

CI24: WHICH TYPE OF ALTERATION? GIVE NUMBER ONLY 

: 2 

CI24: TITLE OR ARTICLE? 

: LONG-RANGE INTERACTIONS OF MERCURY ATOMS 

CI24: LONG-RANGE INTERACTIONS OF MERCURY ATOMS 

CI24: AUTHOR NAME(S)? JOURNAL NAME? ETC. 

: W.C. -^^TWALLEY & H.L. KRAMER J. CHEM. PHYS. 49, 
5555 (1968) 

CI24: MORE ALTERATIONS TO SAME POTENTIAL? 

: YES 



- 51 - 



CI24: WHICH TYPE OF ALTERATION? GIVE NUMBER ONLY 

: 3 

CI24: CHANGE ACTUAL POTENTIAL VALUES? 

: YES 

CI24: NEW VALUE? 

: 15.30 

CI24: 15.30 

CI 24: CHANGE MORE POTENTIALS? 

: NO 

CI24: HALTED:- 00 

: £ENDJOB. 



4, A^.SESSMLNT Or PKE5LNT PQblTiuN 

In assessing the present position of the numerical 
databank we looked separately at the three branches of the 
work, viz, data ext r ac t i on / poten t i a I representation and 
retrieval and 'n^n i p u I a t i on , 

With the extraction of the interatomic potentials 
now almost complete there are three separate matters to be 
dealt with in the future. Firstly, the data base of 
interatomic potentials once completed must be kept up to 
date; with all new publications. Secondly, a check of 
the stored potentials must be made both to ensure no 
errors have been made in storing and also to complete, with 
the help of our outside consultants, the evaluation of 
the data. Finally the subsidiary data bases of quantities 
such as oscillator strengths and po I a r i z ab i I i t i es must be 
created • 

The problem of obtaining a satisfactory representation 
of the potential energies to enable manipulative 
procedures to ue readily applied to them continues to be 
the most difficult one facing us. Our intention is to 
continue to seek a form of the potential data to which 
polynomials can be best fitted. Section 3.2 gave some 
indication of the "reduced" form of potential curves 
which have so far oeen tried. Should no such suitable 
"reduced" form be obtained a straightforward interpolation 
procedure using the stored potential points will be used 
as an a I ternat i ve . 

Finally, the retrieval and manipulation side of the 
system in its present form not only allows for the 
retrieval of the stored potentials in taoular sets of 
Chosen co-ordinates and/or their respective off-line 
graphical representations but has made a start towards the 
inclusion of programs for deriving useful results from 
tnese potentials. To date, part of a large program for 
calculating transport properties of gases to any accuracy 
has been implemented into the system; this allows for 
the on-line calculation of deflection angles (section 3.3). 
However, whilst the deflection angle procedure and other 
identifiable parts of the intended large transport 



property proorom can i r.p I emen ted in an on-line mode, 
it is not practical to air at including tr^e complete 
program in this «ray; instead we aim to aule to 
initiate it via a remote joD entry, similar to tr.e 
graphplottinq program (section 3.3). 

We hooe further to encourage the future use of our 
system by allowing users not just to avail themselves of 
the fixed nurber of operations that we have incorporated 
but also to be able to jxiract and, if necessary, 
manipulate with our data up to the point where they can 
then store it into their own personal files and apply their 
own suite of programs to it. 

In conclusion, a significant new dimension to the 
sybTem is beginning to Diossom. No longer oo we think 
of our system just as a tool for storing numoers, rather 
we believe the true value of the system lies in the dynamic 
concepts of allowing tne '<ser to apply nis own initiative 
in obtaining further data. The desiraoility of sucn 
features can only be judged by experience. 
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r oG ^3, The Queen's UmvCfS.iy of Sc.fdsi. N. Uc.jnd. 



• ■•.T-.DD'JCTION 

J . .w t»:v, 2c:fas:. for ne;»fiY f.we yea's. a reference ."Ctrievol 
s*,s!-:.7i cr.d e rjmericai dj;a system. Rather than provide a 
C^'.c.T. u/vo fcr.cSS Cf TcwiUi^l *c iirvtwO a. d to. Cv-ier for 
Ma-.'C type p'ot»'Cs. nun at oMermg^i f.exioie s^rvice to 
the wno »t ;/f»prfrec :c o^* d up h»i o^wn profile mier- 
JCT -.T c:\-. z-.'Tis. j a;;: f.-st clt: ne our reference 
'c-!' c*v«, pJOicct ind thST xhe r.unicricj] (Ja:a retrieval project. 

2. CUCBJRO REFcREMCE PROJECT 

V,t ^ ^ - .j}f f.'^s: scrutinizing the literature, but even as 
rcct four or five years ago. tnere diJ not appear to 

-jz.o* -.v.cz :i> be foun^. We :h2.T deeded to develop 
^\iz ^Av. iy:'c-\ ..T ihji way v»e hoped to discover the 
p/.*. .\. tr^ invo'v*^ <o»^ nurs^ivi»$ as our s^^sfpm evolved. The 
f V result of our efforii v.as the impJementation of an 
0"-i»r.e ^xf>3fiavjnijl syiicm. which has been operational for 
cv.r three years, and on v/hich we have been developing 
ever Since. 

The cs^cnie of the -ndexino side of the system consists of a 
scrj ct- nverrcd-tii'e and abstract -fite. Although we reinrve 
le . r. for J v.-irifty or scientific subjects our m;iin data 
t^'.-i today co.ns:s*iS of approximately 3.500 atomic and 
moieCw'jr physics atjstracts along with their associated 
references. These abstracts ere extracted directly from bi- 
•ronthfy inspec tapes, which we have been receiving for the 
p^si 18 months or so. Prior to this our content matter was 
t^.-cn frofr scientific books, namely their titles and chapter 
hc^o.ngs loccTrer with ;he bibijOQraphic details of each book, 
V#e f.r$i indexed our local departmental computer science 
library d.nd Since then we have indexed similar libraries in 
pnys.' , T.d appheo maihemai'CS. Alongside the indexing of 
the .v.rer libraries we began extracting papers taken from 
lnip:c lapcs; treating the sentences of the abstracts beionging 
to th£ popcrs in a s.m.la.- nnar.r.jr to ;hi: already adopted for 
the .;j;..ier headings of the books. In this way the basic indexing 
des:on needed little c^iange. A list of the Inspec tape subject 
head'.nQs whicn we use is given in Figure 1. 



We have anemp-jc; ..-p .jrr?rt o-ir :x;.v -ncn;. o'^l' 
pilot iyitem under ihe ^o• ow rg des-jn cfie. ia: 



13.00 


Atomic and Molecular Physics 


13.20 


Atoms 


13.23 


Hydrogen and Helium Atoms 


13.25 


Isotopes 


13.30 


Molecules 


13.31 


Inorganic Molecules 


13.37 


tntermoiecular Mechanics 



(a) 
(b) 
(c) 
Cd) 
(e) 
(f) 

ftGUBE 2. 



Sc:f-i..%iruci.ve 
Siri^p'e to use 
Rapio Rcspcr.sa 
Effective to use 
Minimum Cost 
Efficiency of Storage 
Minimisation of disc accesses 



For detailed descriptions of these headings I refer /ou 
elsewhere (Refs. 1 & 2). It is my intent. on to ou: .r.j - 
facilities Offered by xr-c system ano ;o inoiCuie ho a > v»f 

it art influencing the on-iine conversational sejrchu '^n^ qc. 

The best way to do i his is »o show you a bnef iiJuitr:.:.^n 
(rigura 3) of wbat-a user might experience dunrv} a :y-. 
search. I have chosen as a data base the atomic i; J fr*O cC»-\ir 
physics abstracts extracted from the f nspec tapes cn^i s.o. 
on our fixed disc store. Although the system is i.mcl oi 
being seff-explanatory. I will first outline the facilmes ihoi 
are offered. 

The title and abstraa records (or titJe and chapter headings 
records} can be reirieved using a key phrase consistiitg of up 
to 8 words and the possible alternatives offered are: 

(a) a further modificaiion of tne user's key phrase to ex,A>r4i 
or contract the set of documents retrieved; 

<b) the set of sentences (or chapter headings) containing 
the keyvA>rds; 

(c) the complete abstract of the docuntents; 

(d) the citations of the documents, i.e. the title, author, 
publisher, volume and issue number. 

These points can be seen in the following example: 



FIGURE y. I mpec Htadingi for Physics Odssification 

We have m a special report. SRC. an up-to-date alphabetical 
trin:-odt of the dictionary es exists in our atomic end molecular 
phv'..c$ data bast file at present; along w:th some additional 
siaiisticai information. j 



/ Paper read at NATO Summer Schoo 
Retrieval Systems. 
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I on On-line Mechanised Information 



BfRO . 



KEY? 



CROSS SECTION 

KEY: 107 DOCS/ 206 HQS 

R. H. T. OR M7 

M 

KEY? 

HYDROGEN ATOM 

NEW KEY: 27 DOCS/43 HDS 

INTERSECTION: 8 DOCS/ 21 HDS 

UNION: 126 DOCS/ 244 HDS 

N, I, P OR U7 

I 

R. H, T OR M7 

M 

KEY7 

PROTON IMPACT 



(The options wtr*: 
( R: reference 
( H: beddings 
( T: text of abstract 
(M: more kayt 



(The options were: 

( N: documents of new key 

I I: intersection of documents 

( P: documents for previous key 

i U: documents for new and previous key 



ELECTRON CAPTURE 



BIRD : NJ« P OR U7 
I 

BIRD : R« H, T OR M7 
T 

BIRD : PRINT OUT OF ABSTRACT 



FIGURE 3. 



SAMPLE OF BIRD PROGRAM 



3. ?..\)OAVP r.».v-:-,iCAL PROJECT 



]y ' ■ ' ^'''^ '"'s ■■ -^'o.^c-.. we bo'..;vc<J 

. • Oj;fi a .JD.-r .T-, _ • ^0 '^pini-: jo a ic.eri.ii. ('u.ucntly 
•. . V , r.ner.c. dita : ;n..;e .s re.'iy „.q.;.,.,l. To take 

r, ' . Tr"'-"! " 

r - ■ '-....ty J n.j,- J. J... jjj,^ i.i'cm co'^ d offer v.ouid be 

.:. . use- net c-ly r-vt... a ^......e-- 

0 oe -.o -..ar.,, .. -r.,.. ^^^^^ 
fu .-.anud !^..Tr. ,c. ,n -e f,..T place. Afier foor years exper.erce 



- --. ilErsed :n i ..,..(;: - jdecc'cd is no.-- ,ntera(om.c 
;'u--r.: ji, as nurrcr.cc; for our r.^tjrch. S,nce we had an 

..... -•-^;^ern.:,« dcr..r:.^...i spec-3 ,scd .r. aton,x and 

- - . 3 phyi cs •.-.» fe't :r -'•t be 3b!c to offer some 

o- ,oco, expcr.-,.nto, ,cr,.cc ,o i.s r„eni>ers who ,n (urn could 
^ by prov,j ... c:.!,cal feedback frorr. ,he system, 
•^c^v ..,e b.- nPi.-.j: of such a sysic.-n .s opcraiion.,! and his 
. o. l.vo years. L.« ;he reference reir.evai sysiem the 
op. on of recu.res l.nie or no lt,K,«lcdge on (he oari of the 
•• oon,:„.jn,..,-.es t^y ar.i.vermg r.-,ult.p,v.cho,Cc- qucsiio.is in 
!, -,V: ^"y""'^ ''^'^ 'Vs<e..n aaows .c-.„eval from a data bank of 
<no'v . 0.^ 300 .r.:fra:o.Ti.c potcncals. The design of ih.s system 

:t t """" '^"'■'"^ °' «'»«^ 

.l^ I r.ow br,cfly oulhr.e and ,hcn menl.on some of (he 

""o-e ''<''""'ed informacon I refer you to llw 

onr.ud, repor; (Rej. 3) and 10 Ro*. 4. 

cacn r of a-.cn» can U- .n any of an ,nf,n.,e nurr.ber of stales 
:r.c ..^xc.ed state bc.ng classed as the 'ground state'. For each ' 
s:.. e :he-e .s a potontia! fonct.or. VfrJ, 0<r<oo .^hich 
r.-.rc .r:5 t.^.c forces b.tv.een :h; tv.o ,-,ton,s. Approx.matior« 
tJ ✓rr, d.dcrcr.t rc-.^cj or r are co:. ,ncd by both 
-. -^.-rc: c^l o-,ct expor^mer.tal rr..,n$. There are niany approxi- 
f ^ . in cfie.-e,-.: ,.,v;s of uy d.f;.;cnt people a.-nJ .o 
c..iercr,t accur^ces. An ,.!usirat,on of v.Sat a potential look, 
con be s.^n in F gore 4. In^the data bank arc stored all (he 

- .o :r- :x)tcr.t,.:s ths: can be obta.r.cd from the literature- 

fror'c-r'V' ""'""'^^ "'^ P'^vs.cs Abstracts, those 
fro.. .e.r,, ,cann«i to date. Tr,= fits can be tabular sets 

c. V..OC1 or cocft.cents ;o be mserted .n pre-def.ned formulae 
- h. >■■> .s sjored m atOm.c un.ts but .t can be retr.eved .n any 
<ie:. e , ^n.ts from relevant .nformat«n I.ke accuracy, range of 
va..d.ty and source. 

The systcrn in .ts present form allow, for the retr.eval of any of 

.he stored potentials. As well as the ranges and accuracies 

.eooriod the un..s cin be changed to the needs of the user 

The user coi have un off-lme gr.iph of the- potential .f he wants 

it and'or a pseudoplot of .t on^me. Prccntly programs to 

derive useful results from the potentials, like the transport 

properties uf gase,. arc bemg included. Such program, ar« 

a>rc.dy in existence but cefore they can be used a means has 

to b^ found ,o derive from the different fits a «pooth repr,senution 

of the potential over the whole ranoeO</-< oo Am« » • 

.^■s curve fitting problem i„ustr,t«"the1.n1 of pr^b^;'::":^.""""* 

exp,r..nc.rg ,n this system, but once again w. remain confident 

An .Lustration of the pilot online ,v«cm is giwn in Figur, S • 




FIGURE 4. Typical shape of an interatomic potential. 

The dashed lines give a rough idea of the 
boundaries between the short, intermediate, 
and long range regions of the potential. 



Damp 

DAMP 
OAVP 
DAMP 
DA.V? 
DAMP 
DA^.P 

t 

DAMP 

DAMP 

DAViP 
DAMP 
DAMP 
DAMP 
DAMP 

DAMP 

DAMP 

DAMP 
DAMP 
DAMP 

DAMP 
DAMP 



TYPE YOUR TWO ATOMS 
LI U 

LIST Of= STATES STORED POR LI - M 

1 LIH X 1 SIGMA 

2 LIH B 1 PI 

3 LIH A 1 SIGMA + 
99 ANY OTHER STATE 
WHICH? 

2 

ENERGY AND LENGTH UNITS? 

A.U. A.U. 

OPTIONS? 

? - 

THE OPTIONS ARE 

1 PARTICULAR FIT<S| 

2 BEST POTENTIAL 

3 ALL THE FITS STORED 

4 VALUES IN A CHOSEN RANGE 
2 

GRAPH? 
NO 

NO. OF POINTS AND RANGE? 
10 1.0 4.0 

VALUES OF BEST FIT 
R(A.U.) V(A.U.) 
0.1000E 01 0.9647 E 00 



PSEUDOPLOT? 
YES 

0.9647 E 00 
a5l27E00...;. 
0.6070E-01 ....I.. 



DAMP 



MANIPULATE?' 
NO 



FIGURE 5. Example of DAMP Program 



In cone u^^'on. the dota svitom is not »n as .-^dvanccd stogo of 
u'."0 oprrjnt rs t/ie a^ferercc iysicr* whiCh has l^od several 
ir^if;, twtn local :incJ abrond. This expcnenrt hai holpecl us 
real. so t.'.o ly^c of user ljujui.jj 'hat is aei-'iied, iJ'.J wC now 

ina: our indexing upprcoch n^s lo be greaily nr.pioved. 
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iiEPOHT TO OSTI ON TUV. NATO SUMMER SCHOOL ON ON-LINE 
INFOKMATIOr; RETRIEVAL SYSTEMS 

atteiidod by 

h D Barraclough, Newcastle University, L Higgins, Queens University, 
)ieirn.<t., I M'/Crackeji, UK'JIS, Nottingham University, C J van Rijsbergen, 
Cambridge UniversiU 

The Organisation of the Course 

'I ho SumnuT SVLool va.s split into t-n parts; the first week consisted oi formn L 
r rcsent^Lionr: bv invited speakers. These were very much a survey of the state 
0/ the art o£ on-line systems at tixe present time. During this first voek very 
littio tiino wras ullowci for discussion and only questions of clarifications 
vpre ansverod (iuring the lecture:... In practice most of the irforniation 
prpsenl.d vae not controversial so this d'd not raise too many problems. 

J I. the socohd v.-ek brief presentations were made by members of the course, 
^^eiu rally on woi^ that was actually in progress. The remaindor of the vork was 
Imn 6\wtil on )r'n(?l liiscussions , overing the main areas of the subjeci . Throe 

r U,r OiiTf .siipju)rt(Mi part, ipants gave talks on the vork tliey voro doinr 
tv'M niih n;i nbe jim-Is, In the evenings demonstrations vere arraufjed of the 
w<iii-ihf< ..ii-ixnr. systping. NbM's Vi* Jline v«3 demonstrated on-line to Sweden; the 
CulliniM svstei;i vns derr'^ns trated 1 '-m a video terminal over dial up line to Culhttui 
l.ahorflLorvj ^.ih^ TJevcar^tle Medlar r^ystem was demonstrated from a 2741 to the 
rompnt.-r at N'>vcastle; and the Pi Ucentralen system vas demonstrated to tlie 
pn-uputer in e(^^ltrttl Ct»nenhn/^ ti, A demonstration of both the Belfast reference 
r^ Mr pvj f ^yr M,n and numeriral n/ite ^ydtein vas also given using the Culham video 
MM'iiiMMl. It wfl.s not po-.Ah'o t. ;i-.k up to the Spires-system at Stamford due to 
n"^*^l^p.^l ojiitin;: between tim mo-W^ns m Europe and thos.-^ in the US. The formax 
pHfi. r\ tlie ceuv.so was thus very well organised. However, it is generally the 
?nlorrhHi ])nr». i\m t proves mo.st v/i1uable, and here we suffered as there were no 
couh'juiia. pitin^,^ facilities t th*^ student home where we were staying. As a result 
t[w> ^^roup tended to disperse in tne evenings, in particular most of the lecturers 
•MS ipp^Tired jptr Copeidiagen and ,irere not available for informal discussions, 
rrnceediiuj,'. <if the S^moner School vfill be published in due course. 

Vho in<M n u •] s r overed 

Thr ro.m.H j.r..sPntation8 could divided into a section on tin theory behind ' 
informatior. 7<-,trieval systems whi.-.h vere mainly concerned with clustering 
ipchniques, socondly the hardware available in current computing systems and 
its impact or ij.formabion retrieval ayatema. Here the emphaais vas on the coat 
and speeds of storage media and tranamiaaion facilitiea. The third topio vaa 
+.he design and implementation of ihe ayaiems where the oonoem vaa mainly vith 



the user interface. The last major area was the managemeu. a- d the evaluation 
of these systems. 

j'lieoretical Aspects 

G.mventionally the information retrieval literature distinguishes broadly 
between three types of file organisation: sequential, inverted and clustered. 
The distinction is often convenient but can be misleading. Each type is in 
fact a special case of a clusteipd file-structure. 

A clustered file structure oonsi.ts of two things, a set of clusters and a 
.siet of cluster representatives (-ommonly called centroids) where a cluster 
ropresortative characterises (summarises) the cluster. A moment's thought 
wjll show that an inverted file is" a" primitive clustering. In fact it is a 
oJu.stPxing m vha.h each cluster is represented by one and only one index 
i..-xm. The clusLors also overlap to an arbitrary extent. Similarly, a sequential 
r.l,. is an extreme cse of a clustered file - each cluster contains only one 
noru,ne=.t and is represented by the index te^ms of that document. The point here 
is that u clustered i.le is not different in kind from an inverted file but in 

The. debate as lo whicli of these file structures to use for on-line document 
retrieval ha« now centered about the inverted versus the clustered file. It is 
accept..! now that for on-line document retrieval sequential files are inadequate. 
Tl,. re.spc.nse time is too slow a] .I.ough the effectiveness may be greater than thai 
.•u-.|.u,v.M] vitJi an inverted file. Ho, a major advantage of the inverted file is 
retr...v.L spe^d. However, it i.as been shown that retrieval based on 

'^-^"^^-rmg can achiev. the effectiveness of a linear search 
f-.iov.d ...y r....Kin«. .'otentially cluster-based retrieval is more effective than 
.■H,v rank. .., ... t,od. Intuitively this follows from the fact thai clustering brings 
lo.^eth.r .u..u.„.nts relevant to the same queries while at the same time separat.n« 
th.. re!.... .,at fro.,, the non-relevant. The experimental results supporting this 
cJ.i„> nnst be viewed with caution since they hare only been obtained on 
relatively small data-bases. 

Vi ufeHSor Sal ton criticised tho uva nf * i 

i.xxi,xci8ea ine use of inverted files on a number of grounds. Tlu< 

fn.^i of those (m which he was siipported by Mr Cleverdon) is that inverted 
fi.os ar. nothing more than glorified peek-a-boo systems. The implication being 
th.t ve are noi, exploiting Qpmputer technology to the fullest extent but are 
".s.ng the some techniques for information retrieval as were used before computers 
w.ro introduced. The seoond objection Ib that inverted files limit one to Boolean 
searches whereas it hM nov been eatabllihed that ranking niPthod«, uhIiik 
snphlsticated matching functions, are more effective, l-hirdly it is impossible 



to implement feedbacl: procedures when operating on an inverted file. Fourthly 
one 18 stuck with a static indexing system since it is costly to updato an 
inverted file with respect to indexing. Professor Salton claims tliat tl.e answ..r 
to all these problems lies in automatic document clustering. Uiif or i uiiaiely 
clustered file structures have only been tried on a relatively small scalp, 
a.,.1 only in an experimental environment. It would seem that tho testing of 
Ku^omatic document clustering on a largo data-base is long overdue. It is irxw 
the cJ.istcic-.l flies require an in ■ lai investment of order nlogn to in 
Umc for its construction but then Uie construction of a flexible inv.M-Lod 
rn.> is not choap eith.-r. The extra performance and flexibility achiovod from 
li rlusterod filo vou].] seem worth it. 

Ncverthol-ss, it w&b not universally accepted that clustering was necessarily 
t.ho UcBi n.ft 1.0.1 1 struct-urin;/ files. Some people felt it dangerous to lot 
tis,.rs hav.. coi.i,roJ of the data base structure without completely imders tnnding 
i!. Two large operational systems using inverted fiie techniques were those 
iUuatrated by Professor Parker using the Spires system and by Dr Katter and 
Mr McCarn usjng vim My. liars system. 

Na.Ja.ne Wol f f-Terroine gave a survey of some clustering methods. Unfortunately 
tho survey was very sketchy and did not contain any of the theoretical results 
ol.tained .n the last three to four years. She discussed her use of clustering 
iu knyword classification which was mainly based on the work done by K Sparck 
lo'.es. Unfortunately no attempt was made to evaluate the experimental work 
exfopt by visual inspection of the c h ssif ications. 

M.Hlome Woirr-Terjoine in her presentation also hinted at the difficulty of 
nai,,.mati..>r.ny f:,,iing the content units to describe a document. Mr Clever.lnn 
ol.hornt.H ui, Uns difficulty by stating that it was not possible t-, consider 
t'M. :spocirir,il,y of the content units independent of the level of exliaustivity 
or the document description. 

1 1. 'I- r 'I ware 

Dr Holms from the Computing Centre gave a series of talks on the capabilities 
<.r both the hardware and software of present computing systems. He estimated 
lh.-it in Europe wo were still two years behind the US. One area where vast 
improvement could be foreseen is in the provision of large computer stores. 
Por example, a store of 10l2 bits is quoted as costing ;Slo6. The problem with 
informRtioM retrieval. systems is not only the size ftf the store required but 
•l.e data transfer rate between the storage device, the computer and the user. 
Maiohii.g these is the problem of the softvsre deBigner. Dr Helms quoted some 
figures giving the times in niui>yeari required to implement operatiiif? ayfliems. 



For example, IBM's OS system took 5000 man-years of effort to reach its 
present statw. The complexity of such a system con affect its reliability 
and when one is running an on-line system with many remote user;3 reliability 
IS all important. Unfortunately for designers of on-line retrieval systems 
It is not possible to control the operating system that is being used. This 
rather than the information retrieval system itself could well be the major 
area of difficulty. 

JJeslgi and Implementation 

Dr Katter of Systems Development Corporation described the requirements for 
the design of an on-line system. He considered this from a commercial point 
of view in that they were concerned not only to provide a working system that 
was attractive to the user but also to make such a bystem economically viable. 
Die main topics that were considered were the maintenance of the data-base, that 
in how to validate the data being added and how to control the size of the data- 
base by selecting items to be purged. This last problem really had no 
satisfactory solution. Also on the control side one required statistics showing 
the usage of the system. One needed facilities for file security and in a 
commercial system for accounting and billing. For the user of the system one 
clearly had to provide searching capability and here the interface with the 
user was all important. The user also needed the facility to print a sample 
of the citations on the file and options in the form of the output. Many of 
the systems being demonstrated showed the facilities that were described. 

iM tnagement and Evaluatio n 

Two speakers covered these topics: Mr McCarn from the National Library of 
M' "licine was concerned with the m^agement of a large system and Professor 
i^ancaster from University of Illinois talked about evaluation methods. The 
main i^robleins vioh management of such systems are in the communications area. 
With many users spread throughout the United States they could not afford to 
contact individuals in the case of a breakdown of the system. Nor could they 
afford to use the normal telephone network for oomnunicating over such vast 
distances. Both these problems have very little to do with the computer or the 
software system that is running on it. They are almost entirely a communications 
jiroblem arising from the fact that data communication of this type takes a much 
ionger time -nd uses a telephone line very inefficiently compared with normal 
speech. The solution that they are attempting in the States is the provision 
of networks of lines controlled by small oompiuterf which can pack messages and 
thus communicate much more efficiently. This also partially solres the problem 



oi* machine breakdown in that the user can get information from the 
c^.auiiunications network or he can be transferred to a different machine. In 
t'le United Kingdom there are no working networks covering a wide area aud 
tho Post Of rice's plans for such a system are very remote. The notd for 
hrtter and cheaper communication's is obvious when one cor aiders the costs 
of running a search on the Medline system. This dropped to as low as J5 if 
10 people were using it simultaneously* 

Trofessor Lancaster gave a t;urvey of the various on-line systems available 
and the requironi(?nts from an on-line system. He felt that despite the 
inaccuracy m ilie measurements of precision and recall these were the only 
h,. nsurements that could be used fot testing systems and maintained that a 
u^QT on-iine could by sampling the file increase his precision and save a 
lot of computer time by not doing abortive searches. He advocated ranking 
tliG outimb so that the user saw the most specific documents first. This was 
pirticularly important in the on-line system where ihe number of documents a 
a-.er would see would be relatively small. 

Panel Discussions 

Four main areas of on-line systems were discussed, first the .training of users 
and here there was some difference of opinion concerning the use of computer 
aided instruction tecliniques for this type of system. At present the only 
poople actually training users are Medline where the National Library of 
M'»dicine spends three weeks training librarians^ Most of this was not spent on 
tao computer system but rather on understanding the Mesh vocabulary and tae 
indexing requirements. The librarians attending the Summer School felt that 
it should be possible to train for the general use of on-line systems and not 
for a particular system. 

The next panel discussion was concerned with the interface between the user of 
the system. There seemed to be no panacea and no clear way of distinguishing a 
fTood system from a bad system. Devising an experiment to compare two systems 
would be very difficult. 

File organisation was toother topic and the only lesson to be learnt was that 
different types of file are suitable for different file sizes and methods of 
uru». No-one had done any 4Sost .analysis relating ta f ile type and size. This was 
always left to tne system implementor and at present the amount of theory is 
very limited. The final panel disoussion vss on oooperation between libraries. 
The iituaiion in ih% Unitsd 8%%%$§ •••ma §r§n U§§ hopeful than it is hor^i on© 



of the reasons perhaps being that they have too much money. A plea vas put in 
for the extension of British MiBC to cover the European literature as veil* 
Cooperation on on-line systems seems only possible on a cost basis for the 
libraries in one locality. 

Tho Future of on-line Systems 

It was clear by the end of the Summer School that on-line systems where the 
u.sor interrogated the system himself had come to stay. The most fruitful area 
for research would seem to be in designing systems for the interrogation of 
more than one data base. It was also felt that the user did not need to have 
access to the complete retrospective file. He wanted only a few relevant 
references to begin with. Retrospective searching could then be done if 
necessary at a later date on a batch system. 

Areas requiring further investigation 

It was apparent that there was a need to test the theory of information 
systems in a real situation. Clustered files are potentially more flexible 
and effective than inverted files. The evidence for this is pretty slim so 
far being based only on small files. It is essential that more research is 
done to prove (or disprove) this claim. One way this can be done is by 
mounting a large scale automatic clustering experiment. 

The user oriented research for which a need became apparent was on the 
application of on-line systems to more than one data base. There seemed to 
b^' two levels tlmt could be distinguished. The actual interrogation of a large 
dfiU base from a formulated query and the assistance for the user during the 
formulation process. At present attempts are made to include both facilities 
in one system with the emphasis on different aspects according to the apparent 
needs of the users. For example, the Medline system concentrated initially on 
interrogation of the data base while the Newcastle Medlars system was concerned 
with user assistance. For large scale systems it is clearly more efficient to 
keep the tutorial aids to a minimum, thus reducing the message processing 
requiremen ts. 

A tru>thorl of ovi^rcoming this conflict in requirements is to provide tutorial 
aMi« nn a sn tellite computer system. Thus, the user would for tutorial purposes 
inlcract with the satellite computer, and would only interrogate the central 
(Jat.a honk when he had reached a predefined level of proficiency. Figure 1 shows 
such a dual systems 
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Central 
Data Bank 
System 



Satellite 
Computer system 
with tutorial 
aids 




User 



Figure 1 

Tlie flexibility that this approach can give is best illustrated when the central 
data-bank system comprises several data-bases ms sh^vn in Pi^re 2 below: 



CAS data base 



Central Data Bank System 



INSFEC 
data base 



BA 

data base 



MEDLARS 
data base 





Satellite 
Computer system 
viili tutorial 
aids fox^ chemists 




Satellite computer 

system with 
tutorial aids for 
medics 




Figure 2 

Each of the above satellite systems can be independently developed to enable 
a user - chemist or medio - to interface to any of the data-bases within the 
cf^ntral data-bank using the language of hi. own subject area.. Note that each 
sMlellite system will probably employ similar initial tutorial aids such as 
data-base description. Further advantages that can be realised by this 
approach are as follows: 

a) Satellite systems can be dereloped as user needs dictate without affecting 
th. operations of tht otntral data-bank lysttm and latellits alrsody being 
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served by it. 

b) Satellite systems, lik« the Newcastle University on-line Modlars systoin 
and Belfast's on-line Inspec system, have demonstrated that.sucli systems 
can be developed and tested before the large central data-bank systoin is 
dovolo{K.(l. Therefore there is no reason vhy development should not boRin 
on satellite systems for other data-bases/user groups. 

Problem Areas 

A problem facing any designer of an on-line system is that of the availability 
of telecommunications software. For users of the current range of ICL computers 
this software is primitive. To develop an effective on-line system with such 
computers will require considerable resources to bring the basic telecommunicati 
software up to an operational level. 

There are two major reasons why existing manpower resources should not be 
utilised in this manner and they are» 

a) ICL are in the process of producing a replacement for the current range of 
computers and this may render any telecommunications software developed 
obsolete on the new range of machines. 

b) Telecommunications software development is an area of sysfems design and 
programming which should be viewed by us in much the same way as we view 
the development of compilers and operating systems. 

It would be more fruitful to employ whatever manpower resources there are 
nva.il 7>le in developing those aspects of on-line information retrieval which 
are germane to our current expertise and interest. This can only be achieved 
if such development vcrk is carried out on hardware which has adequate 
telecommunications software support. Adopting this policy, we can still retain 
the initiative that systems such as the Newcastle on-line Medlars system has 
given us. 

The other p-oblem that we are faced with in this country is the lack of data 
communication facilities at a reasonable price. It would be possible to design 
a data network and predict what the cost of its use would be but as we are 
wholly dependent upon the GPO for oonununioationa facilities such a network 
must be many years off. In the meantime we perhaps have to accept that on-line 
searching is going to be expensive but if we are going to retain any expertise 
in this field we must continue with this workt 
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* The BIRD On-line Retrieval System 

L. Hif^^ins 
Departn-ont of Comiiuter Science 

At Belfast 've have been developin^^ an on-line ^ofe^'-^nce rMri vitl 
system, ^^IRD. At present (September 1972) the BIRD data nase cliow.. 
retrieval fr^m more than 5,000 recent papers taken from the phyriic:: .'.bc;tr 
section of the bi-monthly Inspec tapes on Atomic and !.bl*^cular Phyoics. 
A sceondary data base allows retrieval from the books in trrec looaJ 
departmental libraries, namely numerical mathematics and comr-uter ,cionce 
physics, and applied mathernaticG - altogether about 1,000 books. 

The talk proposes to outline the facilities BIRD offers to users, 
to discuss some of the problems that have arisen in implonen ting nnd 
maintaining BIRD, and to offer some c;uidelines of 'user needs' as ^exper- 
ienced by users of BIRD. 



* Paper read at the Universities Computer Science Colloquium 
held In Edinburgh from September |9th to 22nd, 1972* 
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Tr.'- BI'O ,-■ • i^j] ■-:.:;<•,, cr.-li.no, " ■ -.•/.si.i, . • .■■ c':v 

.■. -t!^. in.jl err, '^ct s.-^'irc'" ca; : i+ . . '.'r.- u ■ . -.. -v 

'. - r. r.,,...,-.-, ., . _ _ ,,r.-^s. -•,1.-' - .r.- •.£•-;! 

o r-r/ rro,:r-.,- contir.u c • cj ,.d'] -ew fect-.r-s and to ir.cri,-.ce -vr-xi: syst-cr 
Oiii'-cit;-. ?ror, c t -.^^r.r.in^ consiiera * : on v/as; -:ve- to che-r-er dice 
G'.or:-,-o :--,cc b-co-.n- .iv'ilabU', ^o nai tecl eco-ir-or. cation networks 

be--- e-coura-ed, -nd -ven in -.he lor.-er t,^rm, t.o n-w toc^rolo-ics, such 
••s i'.ser storaf^r. techniques -.^ -i r.-icrowave tra-sriss i .-^n. 

Ri^iLT than pro-ide a current aivar-ness or retrosne • tive s-r-v.rp 
ana zo c-ite- for st-.tic f^e profiles, vr = offv-nn ■ a flexioio 

i-.'i to the ur.-r who is r.reparei to bi;>ld up his ovvn .--ofile int.:n- 
time. The syste.m in its present form can retrieve 
ir.rorm'tion fror over y)00 al../„ract Inspec ::h.ysics r-cords and also fror.i 
ho-y-r, f-ree dr-partmer -.al libraries, nanely, ^--o n^ipcricd and coin-.-iter 
sr.. r.ce i,o., l, , ^ ih-s:-.'; boo,-.s, a-d 3^0 ap];]i,d matheir.a; ics books. A 
MoM-e of t ,o -.,..1] r.-ston, aan be seen in th^^ followin-- illustratjon: 



'hOHIP.D OW.P.VJ.E;: 



■ERJC. 



Circle (1) ohows what data is self?cl^-d for -ri-^ rv In.stf^c 
data file and for the recon^^an/ book data file. I ;vj J i * *o a vour 
attention for a moment to ar-^tner illustration w^iict shvov.s - re cl'. ri / thp 
subject classification headings that we extract fro- the bi--or/Mv Tnorec 
ta] vr ich ;ve hwe been receiving for ♦.he r 3t c frhl-^-en ron'hs: 




eouming now to the former illustration; 



QUOBIRD SYSTEM OVERVTET 



Circle (2) shows the actual data itself that is ftxtr?.cted fro- f;ach 
book and fron each Inspec tape record. 

Circle (3) shows the machine readable format of the data bo;'or.: being 

transferred to disc. The information from the books is f . inched on-o cards 
type 

in an iilC/ format. I think it is worth mentioning hero for- the be- rit or 
those familiar with ::IC that we do not impose the c-nctr. ints dr-.-:-i--i by 
the IIIC format. By that I moan that all ^he ficldc v.lthin each ; .r-^l-:!) 
unit record are variable lengths. This, of course, ic mor^ in .:cop r, - -.v-th 
the 7ARC type format in which the already establishe; E^iP.P^C data t:.. .. 
records are composed. I might add here that the reason for using > IC t.vr^o 
format in the first rlace was that we thought at the beginning (when •.•.(,■ 
started our whole project by indexing the books in the computer science 
departmental library) that we might want to use the NIC package system for 
producing catalo,?ies and so forth. However, after closer consideration we 
did not think th's v/orthwile. 



TheoreticaT. Aspects 

Circle (5) implemented by circle (4) shows the file organisation 
that makes up the QUOBIRD files , 



OVERALL PILE STRUCTURE 



Th±z illustration shows the data file content and structure more clearly. 
It is a three-levelled inverted file processed by a hash indexing techniquo, 
Back again to the first illustration: 



QUOBIRD SYSTEM OVERVIHf 



Circles (6), (?) and (8) bring us back now to tho on-line retrieval 
prorr^im, BIRD, 

The facilities offered by BIRD can only truely be judged by a real- 
time on-line demonstration. However, since this is not on today's 
projPTarpme allow rne instead to tell you a little more about BIRD and then 
I will shov/ you a sample printout of what a novice user m^^jht experience 
during a tyijical on-line BISPEC data base search and -.vhat the experienced 
user mi^ht achipve d«pini^ the same search. 
The BIRD system is: 
F^gi-Text : 

Every word (except those on the indexeiv judged "noise word exclusion 
list") is indexed at the word level as a searchable term. 

On-line : 

The BIRD system operates in a real-time environment and uses the 
ordinary GPO telephone line as the communications link between user and 
system. 



- V - 



Interactive : 

The full-tpxt concept and on-line mode of operation pemlt a 

kUh degree of interaction between the user and the inforr^ticr v.lth 
which he is working. 



■»"o have attempted to implement our present experimental pilot 
systen ,Lnder th^^ fo3]r.ving design criteria. 



DESIGN CRITERIA 



Jl'iC 



The jser comiunicates with the system in ordinary i-lish, and the 
dialogue guides hi- throu.-h .-ach step of the research and retrieval 
rrocesso ^-"'or exanple, in response to the entr^/ of a set of se-rch 
terms, the systern reports to the user the number of documents which 
satisfy his request and asks him whether he wishes to display the 
material or modify his search with additional terms. 

The system's man-machine ijaterface has been designed for simple 
yet effective operation. User .raining is rainirral and pxp.eriK^ce by 
users has shown that one search session of half an hour is sufficient 
to be familiar with the mechanics of the system. V/e have, in addition, 
a s^.all user manual which is easily read priot to a search or, if need 
be, referred to during a search. Other features of the system include 
a user help command facility, where a user can ask for guidance if he 
doesn't understa- i a prompt. 
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IN SPEC '-ADITIIS FOR PHYSICS CLASSIFICATION 



13.00 ATOMIC AND MOLI^CULAR PHYSICS 



13.20 ATOI-IS 



13.23 ITYDROr:^ AND HELIUM ATOMS 



13.25 ISOTOPrlS 



13.30 MOLECULES 



13., 51 mORG-^-IIC MOLECULES 



13.37 INT^'^r^OLEC'iLAR MEC'IANICS 
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DESIGN CRITERIA 



SELF INSTRUCTIVE 



SIMPLE TO USE 



H/iPID Ri^SPONSE 



EFFECTIVE TO USE 



MINIIAJP4 COST 



EFFICIENCY OF STORAfJE 



MINIMSma DISC ACCES.^ES 
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Sample Display of a Novice Search 



BIRD: TfTERACTIVE SUBJECT INDEX OP ATOMIC AND MOL. PHYSICS RECORDS 

BIRD: ARE YOU FAMILIAR WITH THE SYSTEM? 

: NO 

BIRD: T'fPE "?" IP YOU NEED MORE DfPORMATION AT ANY STAGE. 

BIRD: TYPE "X" IP YOU GIVE UP AtW VrCSH TO STOP T!IE PRO&RAM A-" ATfY T^ME 

BIRD: TfPE "A" IF YOU ^-ANT ANOTHER SEARCH AT ANY TIME 

BIRD: KY? 

BIRD: r/PE A PHRASE, NO :.ffiPi: THAN 8 WORDS. THIS (KEY) WILL BE USED AS A 

BIRD: UIIIT FOR COMPARISON IN A SFARCH OF TITLES AND SUB HEADCrxS 
BIRD: KSY? 

: AR&ON 

BIRD: KEY: 223 DOCS / 347 HDS 

BIRD: R,S,AB OR M? 

: ? 

BIRD: R: LIST OF REFERENCES REQUIRED 

BIRD: S: HKIDWOS THAT CONTAIN THE KEYS 

BIRD: AB: FULL TEXT OF ABSTRACT 

BIRD: M: MORE KEY WORDS TO BE INCLUDED TO LIMIT OR INCREASE THE 

NULffiER OP DOCUMENTS RETRIEVED 

BIRD: R,S,AB OR f^? 

: M 

BIRD : KEY? 

: EMISSION LINES 

BIRD: KEY: 60 DOCS / 78 HDS 

BIRD: INTERSECTION: 4 DOCS / 22 HDS 

BIRD: UNION: 279 DOCS / 419 HDS 

BIRD: N,I,P OR U? 

: ? 

BIRD: N: THE DOCmffiiTS POR THE NEW KEYS 

BIRD: I: THE DOCUMENTS COMMON TO THESE AND PREVIOUS KEYS 

BIRD: P: THE DOCUMEilTS FOR THE PREVIOUS KEYS 

BIRD: TJ: THE DOCUMEMTS FOR BOTH THE NEW AND PREVIOUS JCEYS 

BIRD: N,I,P OR U? 

: I 

BIRD: R,S,AB OR M? 

: X 

BIRD: THANK YOU .'\ND .^OOD DAY 

BIRD: DELETED:- OK 

: JEEI'IDJOB 

MCS : MCS (CORE: 640) 

MCS : CONNECT TIME 6:06 MILL TIME 0:933 DISC TRANSFERS 51 

MCS : LO!}Om! LINE 2 f«PF AE&D1254 12/37/54 15/09/72 



Multi-Term Clarification 

Simplest and nort precise search involves a query about a 
subject which can be specifically described in a phrase, 
e.g. diiisociative electron attachment in carbon dioxide. 

A constraint on the number of documents recalled is imposed 
by Ubin^ key.vords in a phrase which must occur within a sentence 
to be r*^ corded as a hit, e.g. 

: RESONANCE SCATTERING 

BIRD: K^: 16 DOCS / 19 HDS 

BIRD : r^? 

: R.?:SOMAI.^CE 

BIRD: KEY: 198 DOCS / 301 HDS 

BIRD: R, H, T OH 

: l^i 

BIRD: '^jT? 

: SCATTrlRH^'G 
BIRD: KEY: 190 DOCS / 369 HDS 

: IMTERSECTION : 30 DOCS / 77 HDS 

: mmi: 358 DOCS / 630 HDS 



Sample Display of an Experienced Searcher 



BIRD: 
BIRD: 

BIRD: 

BIRD: 
BIRD: 

BIRD: 

BIRD: 
BIRD: 
BT^: 
BIRD: 

B]RD: 

BIRD: 

BIRD: 
BIRD: 
BIRD: 
BIRD: 

BIRD: 

BIiU): 
BIRD: 
BIRD: 
BIRD: 
BiRD: 
BIRD: 
BIRD: 
BIRD: 
BIRD: 
BIRD: 
BIRD: 
MCS : 

BIRD: 
BIRD: 

BIRD: 

BIRD: 
BIRD: 
BIRD: 
BIRD: 
BIRD: 

BIRD 



BIRD 



PHYSICS RECORDS 



DOCS / 347 HDS 
5 DOCS / 1 2 HDS 
HDS 



D^TERACTIVE SUBJECT INDEX OF ATOMIC AND MOL. 
ARE YOU FAMILIAR iVITH THE SYSTEM? 
YES 

kj:y? 

ejert 3-ases 

KiY: 37 DOCS / 52 HDS 
R,S,AB OR M? 

KEY? 
AK^ON 

J;Y,7 KEY: 223 
Ii:'!'ERSECTION: 
UiaON: 255 DOCS / 397 
N,I,P OR U? 
U 

R,S,AB OR M? 

M 

K^Y? 

ISOEL?:CTRONIC EMISSION LIKES 
nY: 3 DOCS / 4 HDS 
L'.'TERSECTION: 1 DOCS / 4 HDS 
I^JION: 257 DOCS / 400 HDS 
K,I,P OR U? 
1 

R,S,AB OR M? 
S 

- TITLE: 

NEW OBSERVATIONS OF TfB SPECTRA OF ARGON X TO XV AND OF ISOELECT 
RONIC EMISSION LfflES IN SILICON VTT TO X, PHOSPHORUS X, SULPHUR 
IX ?0 XII AND CHLORINE X TO XIV 

T:'IS PAPER REPORTS THE CLASSIFICATION OF SPECTR/J. i^EJES OF CHLOR 

r :-: IX TO XIV mx) of argon x to xv emitted from tfie plasma forme 

D n A THETA PINCH 

T!:^-; -.WELENGTHS of the 2S/SUP 2/2P/SUP N/-2S2P/SUP N+1/ EfflSSION 
LHIBS ENABLE TIIE CAXCULATION OF GROWID TERM IIIT^:RVALS E; TITE SO 
Ly? AI3JND.WT ELERfflNTS SILICON, SULPHUR MD ARGON 
THE ivIvASURED TITERVAL M A?.GON XIV ADDS CONFIR'IATIO-I TO "^■'E IDEN 
BRFAK: 6675 O:.' LINE LIMIT 
£RESU»E 

TI7ICA7I0N OF THE CORONAL FORBIDDEN LffiE AT 441 2 AA 

USEFUL? 

YES 

R,AB OR M? 
R 

"NE';/ OBSERVATIONS OF THE SPECTRA OF ARGON 
RONIC EMISSION LINES IN SILICON VII TO X, 
IX TO XII AND CiiLORINE X TO XIV 

BY FAV/CETT, B.C. GABRIEL, A.H. PAGET, T.M. 
REF.NO. =297750 PUBLISHED IN J. PHYS. B (GB) 
LY 1971 
S,AB OR M? 
X 

Tlim YOU AND GOOD DAY 



X TO XV AND OP ISOELEC 
PHOSPHORUS X, SULPHUR 



VOL. 4 NO. 7 jn 
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Problems in Implernentation and A!aintonance include ; 

(a) System Security Requirements 

(b) Text Editing Facilities Necessary 

(c) System Expansion Requirements 

(d) Hardware Failures 



guidelines for Future User Needs: 



Backup cs well as move forward capabilities 



Greater Flexibility- 




Dictionary lookui 



Synonyms^Related Terms^-T^Searchonyims Hierarchical Da^^r Words 

Display | 

HE: HELIUM 
HE: Pronoun 
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L. D. Hi^f^ins c-nd F. J. Smith: On-Lira Subject Indexing and 
'.etrieva], Pr -;ram 1^6°, ■,, xm. 12,7-56. 

L. D. =!i,--inE and P. J. Smith: Disc Access Al;^orithms, Computer 
Journal, Vol.lA, \^o.3, w ?49 - 53, 1971 . 

M. C^rYii]e, L. D. Higgins • nd F. J. Smith: Inter;; ctiv- Hcferonce 
Retrieval in Lar-e Files, Inform. Stor. Retr. Vol.7, rp.^'!'5 - 2lr, 
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The success or failure of a computerised on-line IR system depends 

1. Hardv/are/OS software backup 

2. The indexing language adopted 

3. The indexing techniques involved 

4. Good User Interface needs 

5. System Evaluation 
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Report on tho Univ.- rsities Cor-r.itpr Science Collor.-.ivj.-n held in 
Sdinburg;h from Sepje-r'ber 19th to 22nd, ir^2 . 

Although the colloouiur. lasted only tv/o and a half days there 
v/as a heavy program covering a v.-ide range of topics. Thece included 
cor.puter education, computer simulation, information retrieval, 
prograrjrdr.s languages and other miscellaneous topics. Probably the 
most interesting parts of the programme v/ere the invited lectures; 
one by Professor V/irth on "Structured Progra^ning" and another by 
Professor I,:. Wilkes on "The Hardware/Software Interface". 
Unfortunately another invited lecture on "Developments in the Theory 
of Cor.putation" to be given by Dr. P. Landin had to be cancelled. 

i^or those of us from Belfast one of the highlights of the 
programme uas, of course, the talk on "The BIRD on-line reference 
retrieval system" given by Lariy Higgins. This v/as veryr well 
received indeed. In addition, thanks to the Regional Computing 
Centre in Ddinburgh, -we had the une^ipected opportunity to put on a 
live demonstration of the BIRD system in operation. This created a 
lot of interest and quite a number of the delegates were given the 
opportunity of operating the system themselves. 



J. Boyle 

29th September 1972 
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Cnc of the cost jnificiin -spec us the 1972 ^o/.:', 
attention devoted to J^..a in field:; v.r.ich Have recently cor:;, v.l-.^/.r. r. 
of CGD;»?.-. viz. ?::;rth :.nd .^.t mo spheric sciences, 3iolo -iccl sc cr.—^ 
-r.d Aitrophycicc, and ^ja^ineerin,';. The ir.clu.iion of ihcco ryv; .'•_clc.; 
^'-.ivci'cUly ..clcor.-.Gd r.r.d tne poin-c v/as voicod tnat -ohc oxtcr.oior. of C:.. 
activi-.loc, and in parzicular its extcncion to non-quantativc d:.-.-, co . 
1.-. a diiutior. oT effc.-t ivhich v/ould have a detrir.ental eiTec^ in the 
fr^alds ori -inally covered by CODA?A. 

? D". t 'i 1" valu it i on 



As m tno 1 970 conference -...ere v.as soxa discucsion 



data evaluation altr.oush not as cuch. Possibly this v/as bccauac it - . 
z'r.r.t Z0Z.3 progress ivas being nade . or, in some cases, because- cf the 
ation that in some fields it is not possible, nor perhaps evon issl-.bl t. 
divert the niajor effort ;vhich would be required into data evauatic.-.. 

A nun-.bcr of talks v/ere devoted to the problem of sstrin,- r,i -i-. •• 
cf publication v.hich could be used by editors, referees and authors- to t.-.o 
that the reader is informed of tho reliability of data oubiished in a -vt. 
article. Benson (session II) described the progress being -r^ar^c in tho 
of CheTdcal Kinetics ^vhile V/estrurn (session II) suggested a thrco Icvcllci 
structure of publication rules. On the highest level, applying to all .'ic; 
v.'oi.ld be the basic rules or "ten coninandmcnts" for publiohinj data. At an 
intermediate level there v-ould be itiIos apilying to specific fields ..hij. . 
tne lowest level there v.-ould be detailed rules applying to particular . r^^: 
v-i^hin a field. A CODATA task group is v.'orking on the basic cot of .-ulct. 
The conference edition of COMTA NJ.'SLETTSR (number 3) contalnsd a cc::y of 
- paper called"A guide to procedures for the publication of ?hcrrr.oQyr,:-r'._c 

^ - -I. ^ II 



x,.rj^ 



Collections 
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The question v;as put as to whether in fact the cfx-ort ^cvcicc 
/-".cction of data is alv/ays d,irc;cted as effectively .':s it .-i^jht. 



erIc 



. u_c.'..l? .' Lh an;:.!, i.o v.h.c\ rr.oro rj'^-, 

f-K-.- 



■^j C'vctid? -.-cr:; in :;.,^.icul.— .oc-ro': r.o frcl rr.ai they - - 
by -::::;^n;; c:,.t^ collcctior.,: . ■■,"-Jkir.c(nGCc. V) i;;'id that J.; 

ctrcr.o-y they -..■-r- con;. .Jorir.^; sottir-:: fi'ta centre ar, - r, '-lt-r.-...t : v--; 

tr- d:.ta uar..:. Tr.e .-^ncicr. c;" the d- ta coatre would be to oir.^ct. ucorc t-. 

'crj '.he d',t,: thoy ■ zrs s^g" .h^ ccul^ be found. ?roni the u;.o rr.:i'\B -.■ ,-.a 
cctz c -r.tro it v/D.ld possible to d^ternir.e rhc-ther in rc^zt i 'li.- -c.: 
oe ■..ort;...hilt. Schr.crGr(£ssoion .-iiGcsiied the concept of :r.:-or-:.ti„r. 

.-r.alyoi;: Cer.trc v.hooc, fjr.ction io to co-pr-ss and evaluate data i b: 1,-. -.c,^ 
to tr.c dilutior. ai-.d pollution v.'hiy-. occurs i.i the normrA course oi" ovontL. 
The point -..'as also n;ade that the fVinction of a data collection '..as not si~-:y 
to supply tho user v/ith data but also to hi.fnli-ht "holes" in knov.-n .-lata and 
to indicate potentially v.'orthv/hile areas of rose rch. 



.omuters 



CnticisT. v;as expressed of the tendency for people to jurp on the ccr.'utc 
Gandv;a~on and sot up computerised data banks in cases v.'here other for.vs of dat. 
collect .ons, such as books, .-^^ht f-ulfil the needs of the scientific cor— i.y 
nor^ efficiently. The point v/as rade that the coirputer was -ost --f.octivs :.z 
a -oans cf storing data v.-hen, in addition to storin- and, retrievin.^^ the daca, 
tne r.anipulative pov.'o.-s of the computer -..-ere used to process -.he dr.ta. Mil -'.n- 
ratn (session ^/l) suggested that .-naiiy cuantities which were previously stored 
in tabular forr, could no-.v be ir.ost effectively stored by storing basic constants 
and the scftv/are necessary to calculate the required 'quantities from the basic 



cons .ants . 



Black (session II) stated that CODATA was setting up a roster or nanel 
of cx-perts in the field of oonputerised data storage from v.-hc-n advice could 
ba asked for by anyone considering setting up a computerized data bank. A • 
oyrposiu.-^ is being held ne:rt year by CODATA for exports in the field of 
terised data storage. 



CG.f/U" 



J. P. Boyle 
July, 1972 
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at 3ivnol l;nivor:;it:y, L - yth Coptorb-^r, 10/2 

To :^ccor'-od.,to the lai\;c auaicTicn tont "./r^s civ^ctod but r* r, ^ o;* , ~i \1 . r 
pre^entP.d v'?re much too nmrl] for tho vA'ao of the .-renrx no l,hat r.-r.y oT ^ c loct.-r. 
i/cro difficult to follov;, 

Thero were f:onorwlly four parr 1] el i:cssions covorin,: r, -.adc vr:-i*j:: cT 
topics. ::.ar.y of taesc were of a sr.ccialiaed n.;ture c:. ''^\nc; use o:' c.;. uLarc 
in ^rchitoc-.ural dcsir^n" * "-he bu^ldin- of openibi.-- sys.ernL tc contro] u^^jr 
pro^rans \-orking in an on-line rrode". 

Some lectures v:ere given on infer: rtion retrieval ^;,-cto"r., t-;o : 'jr \ l'>,cii 
being one used by bhe U.S. Mavy to monitor the state cnn. po:.itl-.r. wf t:. f'loct:; 
thruu^hout the \;orld. This system used specially desl -ncd 11 ''i:t : :n 11. '\-:z 
or: '..'hich nars of any port of the v/orld can be flashed c^lon- 'VlUi c :at.; r^trlcvjd. 
This is a very coniolex systen on \/hich no expense is spared and svstjr.s sue: as 
these v.-ill be confined to military cstabl ishr^cnts for r.any years 'co cor.-. 

A group fron^ Helsin>i /jave an interesting talk on edi^in^ i y^t I'or ;.s- 
papers. The conputer h:;phenated v;or's a>id did some editing; opcrrtions a.to-atic lly 
buL misspellin.^ roouir::;d user intervention. 

An e>±iibition of peripheral ecuipmcnt v/as given. One of -.he no.t nt'-xst- 
ing exhibits v/as a teletypewriter v/hich had a cassette tape ^it. Inforir t on 
could be stored r.nd edited on the tape and then, v;hen error free, transrl'.c.d 
to the conputer at high spaed. 



M • 0 ' '^ara 



Appendix A7 



Dear Sir, 

An information systen for the retrieval of references on Atomic 
and f!olecul_r Physics is nov/ available to all members of the University 
through any teletyr.e terminal or Visual Display Unit. A VDU is now in 
the School of Applied Ilathematics ar^d Physics Library. 

At present we have nearly 7,000 ref -rences in th- system, consisting 
of papers published over the last txio or three years and we are up-to-date 
v.-ith Physics Abstracts. 3zch reference consists of an c-^bstract an*? the 
usual bibliographic details of author, journal title, volume, number, etc. 

You are now Invited to use the syster,. ••.■e think it ^/ill be useful 
to you £ind it should alio',/ you to retriave papers after a search in greater 
depth than is possible nanucQly with Physics Abstracts, -fg would appreciate 
your comments - good or bad. Only through practical use of the system can 
we hoT;e to assess its facilities. 

Instructions for log.-ing in to the system are enclosed. Our 
recor.rnendation is that you use the systen briefly and then reouest the user 
nrjiual --hich gives further ^iidance. 

SPi?]^s..Ji}..th£ School L ibrary 
A secondary systen allows the retrieval of books from the School 
Library. Titles and chapter headings may be searched for subjects on 
i.-hich information is renuired. 

For further details please refer to:- llrs. Joan Stewart 

extension 489 (G-.P.O.). 

Yours faithfully. 



L, D, Higgins 



USING 



XJ^IL^l L ...IJ. SJP^L_/^Y_ UN I 



: ;^J0B,PUBL,ABSD1254 

Press iISCAPE 

MCS : PASS^fQRD? 

: 8 spaces 

Press rSC/iPE 

MCS : LOSBI LEffi 

St^e : ;^RUN , ICL , ABST , 1 TTR 

Pr-ess ESCi'lPE 



: ;2fj03,PUBL,ABrD1234 

flCS : PASSV70RD? 

spaces 

Press_ ESaPE 

'iCS : LINE 

SZEg. : ;gfRUN , ICL , M PB, RETR 

Pres£ ESC/PE 



i-12LP^?^IER SCIEMCB_DpOKS 

: ^J0B,PUEL,;jBED1234 

Press. ESC/PE 

MCS : PASS^'fORD? 

Ties. • 8 spaces 

Press ESC/PE 

HCS : LOGBI LINE 

TJEI : jjfRUN , ICI , CSC3, ^^.TTR 

Pres s ESCAPE 



User r.ctions are given on the left dnd are underlined. "Press" refers 
to depressing the "-SC.TE" button after eech instruction has been typed. 

TjEe : 0KSS\m after a "DRE'jv'" message from MCS 

Ti^e : ^EMD JOB after "X" has been used to delete the program. 



?pR ATprgC ANI) npLTCCHL-'Jl PHYSKS^ ABSTRACTS 





"job,pi.t3l,ab':di 234 


Press 


ACCEPT 


MCS : 


PaSS^^OPJD? 




8 spaces 


Press 


ACCEPT 




LO&BI LINIi; 




^UN,ICL,ABST,T7rR 


Press 


ACCEPT 



PQA. APPLIED WsTiniV'^ICS mij^ ?P2^ 

= rjOB,PUBL,/J3ED1 234 
Press ACCEPT 

MCS : PASS'^ORD? 

Type : 8 spaces 
Press^ ACCEPT 

MCS : LO&IN LWH 

Type : £RUIJ , ICL , AIIPB, RT5TR 
Press ACCEPT 

T^ge : €J0B,PUBL,ABED1234 
Pres£ ACCEPT 

MCS : PASSrrORD? 

T^e : 8 spaces 
Press ACCEPT 

MCS : LOaiN LINE 

Type : £RTIN , ICL , CSCB , RTTR 
Press ACCEPT 

User actions are van- on the left and are underlined, "Press" refers 

to depressing the "ACCEPT" button after each instruction has been typed. 

Tj!pe_ : iJlESUIlE after a "BR2AK" message from MCS 

Type : vCEtlDJOB after "X" has been used to delete the prc^^ram. 



To: Console Representative 
From: Computer y^dvisory Service 

Please display the enclosed posters adjacent to vour 
Teletypewriter. They should be attached in sequence as 
shovm below:- 



OPERATING THE CONSOLE 



THE CONSOLE 




STA^TIIIG 



LOGGING IN 
LOGGING OUT 
CANC??LLING A MFSSACF 



CONSOLE MESSAGES 

STOPPING 

FAULTS 



ERIC 



1 i-'f;''"' 



'r 



Thank you, 

GPFGG 
Jan, 73, 



Appendix A8 



Jahreskolloquium zur Rechentechn I k 

Di en stag, 29 . Februar 1972 

Technlsche Universitaet Braunschweig 
Pocke I sstrasse 4 ( Hauptgebaeude ) Hoersaal S4 

Kurzfassungen der Vortraege 



F. J. Smith: "R^I.O.T,: Retrieval of Information On- I ine 

by Te I ep hone" • 

We have been building two on-line information systems here in 
Belfast: one for the retrieval of references and the other 
for the retrieval of numerical atomic data* They are designed 
for retrieval via a computer terminal linked to our data bank 
through the ordinary telephone network; so anyone who can 
dial the Belfast exchange can connect with our system, and 
through a siff^ e question and answer form of conversation with 
the system can, we hope^ retrieve the information he needs* 
Much emphasis has been placed in our work on the efficiency 
of the software, as the main stumbling block to the develop- 
ment of this kind of information service is the great cost* 
By careful study of the techniques used we think we have 
reduced these costs by a considerable factor* 



